7 years ago I posted one tutorial about recommendation systems. Now it is 2022 and there are many more advancements. This post will overview several latest ideas. CTR models Google’s recsys 2022 paper [1] introduces many practical details on their CTR models. First, to reduce training cost, there are 3 effective ways: applying bottleneck layers …
Category Archives: Algorithm
New Model Architectures
There are many advancements in new model architectures in AI domain. Let me overview these advancements in this post. Linear Compression Embedding LCE [1] is simply using a matrix to project one embedding matrix to another: , where . Pyramid networks, inception network, dhen, lce Perceiver and Perceiver IO Perceiver-based architectures [5,6] solve …
Simulation on the ads supply problem
I start to feel the importance of simulating any practical problem before deploying an RL policy. If you cannot implement a reasonable simulator on your own, you are not clear about your environment and your model. It is then a pure gamble to me if we just train an RL policy offline without testing in …
GFlowNet
GFlowNet is the latest technique developed for solving combinatorial optimization problems [1]. I’ve prepared a series of deep dive slides for it (See GFlowNet deep dive). In this post, I just list a few more references. References [1] https://yoshuabengio.org/2022/03/05/generative-flow-networks/ [2] https://towardsdatascience.com/the-what-why-and-how-of-generative-flow-networks-4fb3cd309af0 [3] https://neurips.cc/media/neurips-2021/Slides/26729.pdf [4] https://www.youtube.com/watch?v=7W69-ffTs48 [5] https://milayb.notion.site/GFlowNet-Tutorial-919dcf0a0f0c4e978916a2f509938b00#afe03e54d6db43468f8dee3a3350f98a [6] http://folinoid.com/w/gflownet/
How does Metropolis-Hastings algorithm work?
I learned about Markov Chain Monte Carlo (MCMC) algorithm a little bit during my phd but I did not record my thoughts back then. In this post, I revisit the core concept of MCMC, particularly focusing on illustrating the Metropolis-Hastings (MH) algorithm. What is the motivation of MCMC? Suppose you have observed some data . …
Continue reading “How does Metropolis-Hastings algorithm work?”
Tools needed to facilitate long-term value optimization
[This post is inspired by one of my previous posts [4] which combs my thoughts on how to build a validation tool for RL problems. Now, I am trying to comb my thoughts for building tools to facilitate long-term value optimization.] There is one important topic in applied RL: long-term user value optimization. Usually, this means …
Continue reading “Tools needed to facilitate long-term value optimization”
Some SOTA Model-based RL
Model-based RL has always intrigued me more than model-free RL. Because the former converts RL problems into supervised learning problems which can always employ SOTA deep learning techniques. In this post, I am introducing several latest developments of model-based RL. I categorize them into planning and non planning-based. Planning This is one I am reviewing …
Laplacian Approximation and Bayesian Logistic Regression
Recently, I am studying a basic contextual bandit algorithm called Logistic Regression Bandit, which is also known as Bayesian Logistic Regression. It all starts from the paper “An Empirical Evaluation of Thompson Sampling” [1]. In Algorithm 3 in the paper, the authors gives the pseudo-code for how to train the Logistic Regression Bandit, assuming its …
Continue reading “Laplacian Approximation and Bayesian Logistic Regression”
Markov Chain and Markov Decision Process on Graphs
It is a cool idea that we can formulate data of many problems as graphs. It is even cooler that we can improve graph-based algorithms with Reinforcement Learning (RL). In this post, I am going to overview several related ideas. Warm up on graphs – GCN Let’s first warm up with some contexts on graphs. …
Continue reading “Markov Chain and Markov Decision Process on Graphs”
Normalizing Flows
Some update before we dive into today’s topic: I have not updated this blog for about 2 months, which is considered a long time : ). This is because I have picked up more tech lead work for setting up the team’s planning. I sincerely hope that our team will steer towards a good direction …