Recent advances in Neural Architecture Search

  It has been some time since I got touch in neural architecture search (NAS) in my PhD, when I tried to get ideas for solving a combinatorial optimization problem for collectible card games’ deck recommendation. My memory about NAS mainly stays in one of the most classic NAS paper “Neural architecture search with reinforcement …

Recent advances in Batch RL

I’ll introduce some recent papers advancing batch RL. The first paper is Critic Regularized Regression [1]. It starts from a general form of actor-critic policy gradient objective function, where is a learned critic function: For a behavior cloning method, . However, we can do much more than that choice: The CRR paper tested the first two …

Some classical methodologies in applied products

I am reading two papers which uses very classical methodologies for optimizing metrics in real world applications. The first is constrained optimization for ranking, from The NodeHopper: Enabling Low Latency Ranking with Constraints via a Fast Dual Solver. The paper performs per-slate constrained optimization: Here, is item ‘s primary metric value, is item ‘s position after …

Reward/return decomposition

In reinforcement learning (RL), it is common that a task only reveals rewards sparsely, e.g., at the end of an episode. This prevents RL algorithms from learning efficiently, especially when the task horizon is long. There has been some research on how to distribute sparse rewards to more preceding steps. One simple, interesting research is …

Self-Supervised Learning Tricks

I am reading some self-supervised learning papers. Some of them have interesting tricks to create self-supervised learning signals. This post is dedicated for those tricks. The first paper I read is SwAV(Swapping Assignments between multiple Views of the same image) [1]. The high level idea is that we create clusters with cluster centers . These …

Precision Recall Curve vs. ROC curve

While ROC (receiver operating characteristic) curve is ubiquitous in model reporting, Precision Recall Curve is less reported. However, the latter is especially useful when we have imbalanced data. Let’s review pertinent concepts. True Positive = TP = you predict positive and the actual label is positive False Positive = FP = you predict positive but …

GAN (Generative Adversarial Network)

Here, I am taking some notes down while following the GAN online course (https://www.deeplearning.ai/generative-adversarial-networks-specialization/). The first thing I want to point out is that one should be very careful about the computation graph during the training of GANs. To maximize efficiency in one iteration, we can call the generator only once, using the generator output …

Projected Gradient Descent

I am reading “Determinantal point processes for machine learning”, in which it uses projected gradient descent in Eqn. 212. More broadly, such problems have this general form: where we want to map from to on the simplex. Since we often encounter problems of the sum-to-1 constraint, I think it is worth listing the solution in …

Many ways towards recommendation diversity

Diversity of recommendations keeps users engaged and prevents boredom [1]. In this post, I will introduce several machine learning-based methods to achieve diverse recommendations. The literature in this post is mostly retrieved from the overview paper “Recent Advances in Diversified Recommendation” [6].  Determinant Point Process Let’s first review what is the determinant of a matrix …

Focal loss for classification and regression

I haven’t learnt any new loss function for a long time. Today I am going to learn one new loss function, focal loss, which was introduced in 2018 [1]. Let’s start from a typical classification task. For a data , where is the feature vector and is a binary label, a model predicts . Then …