I feel it is a fascinating perspective to view LLMs as compressors. Today, we are going to introduce the basic idea of it. We first use very layman terms to introduce what compression does. Compression can be seen as representing a stream of bits with a shorter stream of bits. It is based on assumption …
Category Archives: Algorithm
More details in DPO
In this post, we dig into more details of Direct Preference Optimization [1], a popular method used in RLHF. First, we start from the normal RLHF objective that is typically used in PPO literature, which is equation 3 in the DPO paper [1]. Typically, we have input prompts and an LLM’s responses . The objective …
Causal Inference 102
In my blog, I have covered several pieces of information about causal inference: Causal Inference: we talked about (a) two-stage regression for estimating the causal effect between X and Y even when there is a confounder between them; (b) causal invariant prediction Tools needed to build an RL debugging tool: we talked about 3 main …
Reinfocement Learning in LLMs
In this post, we overview Reinforcement Learning techniques used in LLMs and alternative techniques that are often compared with RL techniques. PPO The PPO-based approach is the most famous RL approach. Detailed derivation of PPO and implementation tricks are introduced thoroughly in [2]. Especially, we want to call out their recommended implementation tricks: SLiC-HF SLiC-HF …
Llama code anatomy
This is the first time I have read llama2 code. Many things are still similar to the original transformer code, but there are also some new things. I am documenting some findings. Where is Llama2 Code? Modeling (training) code is hosted here: https://github.com/facebookresearch/llama/blob/main/llama/model.py Inference code is hosted here: https://github.com/facebookresearch/llama/blob/main/llama/generation.py Annotations There are two online annotations …
Improve reasoning for LLMs
LLMs have become the hottest topic in 2023, when I did not have much time to cover related topics. Let’s deep dive into this topic in the beginning of 2024. Prompts Using few-shots prompts to hint LLMs how to solve problems is the simplest form to improve reasoning for LLMs. When you first come across …
Diffusion models
Diffusion models are popular these days. This blog [1] summarizes the comparison between diffusion models with other generative models: Before we go into the technical details, I want to use my own words to summarize my understanding in diffusion models. Diffusion models have two subprocesses: forward process and backward process. The forward process is non-learnable …
Mode collapse is real for generative models
I am very curious to see whether generative models like GAN and VAE can fit data of multi-modes. [1] has some overview over different generative models, mentioning that VAE has a clear probabilistic objective function and is more efficient. [2] showed that diffusion models (score-based generative models) can better fit multimode distribution than VAE and …
Continue reading “Mode collapse is real for generative models”
Causal Inference in Recommendation Systems
We have briefly touched some concepts of causal inference in [1, 2]. This post introduces some more specific works which apply causal inference in recommendation systems. Some works need to know the background of backdoor and frontdoor adjustments. So we will introduce them first. Backdoor and frontdoor adjustment Suppose we have a causal graph like …
Continue reading “Causal Inference in Recommendation Systems”
GATO and related AGI research
Policy Generalist Deepmind has recently published a work named Gato. I find it interesting as Gato learns a multi-modal multi-task policy to many tasks such as robot arm manipulation, playing atari, and image captioning. I don’t think the original paper [2] has every detail of implementation but I’ll try to best summarize what I understand. …