Tools needed to facilitate long-term value optimization

[This post is inspired by one of my previous posts [4] which combs my thoughts on how to build a validation tool for RL problems. Now, I am trying to comb my thoughts for building tools to facilitate long-term value optimization.] There is one important topic in applied RL: long-term user value optimization. Usually, this means …

Some SOTA Model-based RL

Model-based RL has always intrigued me more than model-free RL. Because the former converts RL problems into supervised learning problems which can always employ SOTA deep learning techniques. In this post, I am introducing several latest developments of model-based RL. I categorize them into planning and non planning-based.  Planning This is one I am reviewing …

Laplacian Approximation and Bayesian Logistic Regression

Recently, I am studying a basic contextual bandit algorithm called Logistic Regression Bandit, which is also known as Bayesian Logistic Regression. It all starts from the paper “An Empirical Evaluation of Thompson Sampling” [1]. In Algorithm 3 in the paper, the authors gives the pseudo-code for how to train the Logistic Regression Bandit, assuming its …

Markov Chain and Markov Decision Process on Graphs

It is a cool idea that we can formulate data of many problems as graphs. It is even cooler that we can improve graph-based algorithms with Reinforcement Learning (RL). In this post, I am going to overview several related ideas. Warm up on graphs – GCN Let’s first warm up with some contexts on graphs. …

Leetcode 695. Max Area of Island

You are given an m x n binary matrix grid. An island is a group of 1‘s (representing land) connected 4-directionally (horizontal or vertical.) You may assume all four edges of the grid are surrounded by water. The area of an island is the number of cells with a value 1 in the island. Return the maximum area of an island in grid. If there is no island, return 0. …

Data Parallelism and Model Parallelism

In this post, we review the concept of data parallelism, model parallelism, and more in between. We will illustrate ideas using SOTA ML system designs. Data Parallelism Data parallelism means that there are multiple training workers fed with different parts of the full data, while the model parameters are hosted in a central place. There …

Recent advances in Neural Architecture Search

  It has been some time since I got touch in neural architecture search (NAS) in my PhD, when I tried to get ideas for solving a combinatorial optimization problem for collectible card games’ deck recommendation. My memory about NAS mainly stays in one of the most classic NAS paper “Neural architecture search with reinforcement …

Recent advances in Batch RL

I’ll introduce some recent papers advancing batch RL. The first paper is Critic Regularized Regression [1]. It starts from a general form of actor-critic policy gradient objective function, where is a learned critic function: For a behavior cloning method, . However, we can do much more than that choice: The CRR paper tested the first two …