New Model Architectures

There are many advancements in new model architectures in AI domain. Let me overview these advancements in this post.   Linear Compression Embedding LCE [1] is simply using a matrix to project one embedding matrix to another: , where . Pyramid networks, inception network, dhen, lce   Perceiver and Perceiver IO Perceiver-based architectures [5,6] solve …

Simulation on the ads supply problem

I start to feel the importance of simulating any practical problem before deploying an RL policy. If you cannot implement a reasonable simulator on your own, you are not clear about your environment and your model. It is then a pure gamble to me if we just train an RL policy offline without testing in …

GFlowNet

GFlowNet is the latest technique developed for solving combinatorial optimization problems [1]. I’ve prepared a series of deep dive slides for it (See GFlowNet deep dive). In this post, I just list a few more references. References [1] https://yoshuabengio.org/2022/03/05/generative-flow-networks/ [2] https://towardsdatascience.com/the-what-why-and-how-of-generative-flow-networks-4fb3cd309af0 [3] https://neurips.cc/media/neurips-2021/Slides/26729.pdf [4] https://www.youtube.com/watch?v=7W69-ffTs48 [5] https://milayb.notion.site/GFlowNet-Tutorial-919dcf0a0f0c4e978916a2f509938b00#afe03e54d6db43468f8dee3a3350f98a [6] http://folinoid.com/w/gflownet/

How does Metropolis-Hastings algorithm work?

I learned about Markov Chain Monte Carlo (MCMC) algorithm a little bit during my phd but I did not record my thoughts back then. In this post, I revisit the core concept of MCMC, particularly focusing on illustrating the Metropolis-Hastings (MH) algorithm.  What is the motivation of MCMC? Suppose you have observed some data . …

Tools needed to facilitate long-term value optimization

[This post is inspired by one of my previous posts [4] which combs my thoughts on how to build a validation tool for RL problems. Now, I am trying to comb my thoughts for building tools to facilitate long-term value optimization.] There is one important topic in applied RL: long-term user value optimization. Usually, this means …

Some SOTA Model-based RL

Model-based RL has always intrigued me more than model-free RL. Because the former converts RL problems into supervised learning problems which can always employ SOTA deep learning techniques. In this post, I am introducing several latest developments of model-based RL. I categorize them into planning and non planning-based.  Planning This is one I am reviewing …

Laplacian Approximation and Bayesian Logistic Regression

Recently, I am studying a basic contextual bandit algorithm called Logistic Regression Bandit, which is also known as Bayesian Logistic Regression. It all starts from the paper “An Empirical Evaluation of Thompson Sampling” [1]. In Algorithm 3 in the paper, the authors gives the pseudo-code for how to train the Logistic Regression Bandit, assuming its …

Markov Chain and Markov Decision Process on Graphs

It is a cool idea that we can formulate data of many problems as graphs. It is even cooler that we can improve graph-based algorithms with Reinforcement Learning (RL). In this post, I am going to overview several related ideas. Warm up on graphs – GCN Let’s first warm up with some contexts on graphs. …

Leetcode 695. Max Area of Island

You are given an m x n binary matrix grid. An island is a group of 1‘s (representing land) connected 4-directionally (horizontal or vertical.) You may assume all four edges of the grid are surrounded by water. The area of an island is the number of cells with a value 1 in the island. Return the maximum area of an island in grid. If there is no island, return 0. …