czxttkl – Page 3

Tools needed to facilitate long-term value optimization

[This post is inspired by one of my previous posts [4] which combs my thoughts on how to build a validation tool for RL problems. Now, I am trying to comb my thoughts for building tools to facilitate long-term value optimization.] There is one important topic in applied RL: long-term user value optimization. Usually, this means …

Continue reading “Tools needed to facilitate long-term value optimization”

Some SOTA Model-based RL

Model-based RL has always intrigued me more than model-free RL. Because the former converts RL problems into supervised learning problems which can always employ SOTA deep learning techniques. In this post, I am introducing several latest developments of model-based RL. I categorize them into planning and non planning-based. Planning This is one I am reviewing …

Continue reading “Some SOTA Model-based RL”

Laplacian Approximation and Bayesian Logistic Regression

Recently, I am studying a basic contextual bandit algorithm called Logistic Regression Bandit, which is also known as Bayesian Logistic Regression. It all starts from the paper “An Empirical Evaluation of Thompson Sampling” [1]. In Algorithm 3 in the paper, the authors gives the pseudo-code for how to train the Logistic Regression Bandit, assuming its …

Continue reading “Laplacian Approximation and Bayesian Logistic Regression”

Markov Chain and Markov Decision Process on Graphs

It is a cool idea that we can formulate data of many problems as graphs. It is even cooler that we can improve graph-based algorithms with Reinforcement Learning (RL). In this post, I am going to overview several related ideas. Warm up on graphs – GCN Let’s first warm up with some contexts on graphs. …

Continue reading “Markov Chain and Markov Decision Process on Graphs”

Check object memory in Python

We can use Pympler (https://pympler.readthedocs.io/en/latest/) to inspect an object’s memory in Python. It can get an object’s memory including their references. Here is an example:

Normalizing Flows

Some update before we dive into today’s topic: I have not updated this blog for about 2 months, which is considered a long time : ). This is because I have picked up more tech lead work for setting up the team’s planning. I sincerely hope that our team will steer towards a good direction …

Continue reading “Normalizing Flows”

Leetcode 695. Max Area of Island

You are given an m x n binary matrix grid. An island is a group of 1‘s (representing land) connected 4-directionally (horizontal or vertical.) You may assume all four edges of the grid are surrounded by water. The area of an island is the number of cells with a value 1 in the island. Return the maximum area of an island in grid. If there is no island, return 0. …

Continue reading “Leetcode 695. Max Area of Island”

Data Parallelism and Model Parallelism

In this post, we review the concept of data parallelism, model parallelism, and more in between. We will illustrate ideas using SOTA ML system designs. Data Parallelism Data parallelism means that there are multiple training workers fed with different parts of the full data, while the model parameters are hosted in a central place. There …

Continue reading “Data Parallelism and Model Parallelism”

Recent advances in Neural Architecture Search

It has been some time since I got touch in neural architecture search (NAS) in my PhD, when I tried to get ideas for solving a combinatorial optimization problem for collectible card games’ deck recommendation. My memory about NAS mainly stays in one of the most classic NAS paper “Neural architecture search with reinforcement …

Continue reading “Recent advances in Neural Architecture Search”

Recent advances in Batch RL

I’ll introduce some recent papers advancing batch RL. The first paper is Critic Regularized Regression [1]. It starts from a general form of actor-critic policy gradient objective function, where is a learned critic function: For a behavior cloning method, . However, we can do much more than that choice: The CRR paper tested the first two …

Continue reading “Recent advances in Batch RL”