Notes on Ray paper

I am reading Robert Nishihara’s thesis on Ray [1] while encountering several concepts I am not familiar with. So this post takes some notes on those new concepts. Ring AllReduce Ring AllReduce is a technique to communicate with multiple computation nodes for aggregating results. It is a primitive to many distributed training systems. In the …

Understand Fourier Transform of Images

We’ve covered Fourier Transform in [1] and [2] while we use only examples of 1D. In this post we are going to see what 2D Fourier Transform looks like. First, we look at a 2D image with one direction sinusoid waves (left) and its Fourier Transform (right). I added coordinates to help you understand the …

Revisit Gaussian kernel

This post is mainly about Gaussian kernel (aka radial basis function kernel), a commonly used technique in machine learning. We’ve touched the concept of Gaussian kernel in [1] and [2] but with less details.  Definition of Gaussian Kernel The intuitive understanding of a kernel is that it maps points in a data space, where those …

Optimization with discrete random variables

In this post, I’m going to talk about common techniques that enable us to optimize a loss function w.r.t. discrete random variables. I may go off on a tangent on various models (e.g., variational auto-encoder and reinforcement learning) because these techniques come from many different areas so please bear with me. We start from variational …

Control Variate

In this post, let’s talk about another variance reduction method called “control variate”. This is yet another variance reduction method besides importance sampling [1]. We start from a simple example: we want to estimate , the expectation of function applied on a random variable with sampled from . Analytically, . You can also think of …

Personalized Re-ranking

In the industry there is a trend to add a re-ranker at the final stage of a recommendation system. The re-ranker ranks the items that have already been filtered out from an enormous candidate set, aiming to provide the finest level of personalized ordering before the items are ultimately delivered to the user. In this …

Practical considerations of off-policy policy gradient

I’d like to talk more about policy gradient [1], which I touched upon in 2017. In common online tutorials, policy gradient theorem takes a lot of spaces to prove that the gradient of the policy in the direction to improve accumulated returns is: where is the accumulated return beginning from step from real samples.  Note …

Constrained RL / Multi-Objective RL

Learning a policy that can optimize multiple types of rewards or satisfy different constraints is a much desired feature in the industry. In real products, we often care about not only single one metric but several that interplay with each other. For example, we want to derive a policy to recommend news feeds which expects …

Hash table revisited

I came across how Facebook implements Hash table from this post: https://engineering.fb.com/developer-tools/f14/. It introduces several techniques making modern hash tables more efficient. The first technique is called chunking, which reduces the time for resolving hash collision. The idea is to map keys to a chunk (a block of slots) rather than a single slot then …