Recent advances in Batch RL

I’ll introduce some recent papers advancing batch RL. The first paper is Critic Regularized Regression [1]. It starts from a general form of actor-critic policy gradient objective function, where is a learned critic function: For a behavior cloning method, . However, we can do much more than that choice: The CRR paper tested the first two …

Some classical methodologies in applied products

I am reading two papers which uses very classical methodologies for optimizing metrics in real world applications. The first is constrained optimization for ranking, from The NodeHopper: Enabling Low Latency Ranking with Constraints via a Fast Dual Solver. The paper performs per-slate constrained optimization: Here, is item ‘s primary metric value, is item ‘s position after …

Reward/return decomposition

In reinforcement learning (RL), it is common that a task only reveals rewards sparsely, e.g., at the end of an episode. This prevents RL algorithms from learning efficiently, especially when the task horizon is long. There has been some research on how to distribute sparse rewards to more preceding steps. One simple, interesting research is …

Self-Supervised Learning Tricks

I am reading some self-supervised learning papers. Some of them have interesting tricks to create self-supervised learning signals. This post is dedicated for those tricks. The first paper I read is SwAV(Swapping Assignments between multiple Views of the same image) [1]. The high level idea is that we create clusters with cluster centers . These …

PyTorch Lightning template

Back to the old days, I’ve studied how to implement highly efficient PyTorch pipelines for multi-gpu training [1]. DistributedDataParallel is the way to go, but it is cumbersome that we need boilerplates for spawning workers and constructing data readers. Now, PyTorch Lighting offers clean API for setting up multi-gpu training easily. Here is a template …

Precision Recall Curve vs. ROC curve

While ROC (receiver operating characteristic) curve is ubiquitous in model reporting, Precision Recall Curve is less reported. However, the latter is especially useful when we have imbalanced data. Let’s review pertinent concepts. True Positive = TP = you predict positive and the actual label is positive False Positive = FP = you predict positive but …

Deep Learning-based Sorting

Here, I am talking about a few techniques of using deep neural networks to accomplish sorting/ranking tasks. Reinforcement Learning – policy gradient paradigm Using policy gradient to solve combinatorial optimization problems such as Traveling Salesman Problems is not new. Ranking K out of N candidates is also a combinatorial optimization problem thus can be solved …

GAN (Generative Adversarial Network)

Here, I am taking some notes down while following the GAN online course (https://www.deeplearning.ai/generative-adversarial-networks-specialization/). The first thing I want to point out is that one should be very careful about the computation graph during the training of GANs. To maximize efficiency in one iteration, we can call the generator only once, using the generator output …

Projected Gradient Descent

I am reading “Determinantal point processes for machine learning”, in which it uses projected gradient descent in Eqn. 212. More broadly, such problems have this general form: where we want to map from to on the simplex. Since we often encounter problems of the sum-to-1 constraint, I think it is worth listing the solution in …