Minimal examples of HuggingFace LLM training

I’m sharing a minimal example of training an LLM model using HuggingFace’s libraries trl/transformers/evaluate/datasets/etc. The example is mainly borrowed from https://wandb.ai/capecape/alpaca_ft/reports/How-to-Fine-tune-an-LLM-Part-3-The-HuggingFace-Trainer–Vmlldzo1OTEyNjMy and its github repo https://github.com/tcapelle/llm_recipes/blob/main/scripts/train_hf.py. Here is the full file: Now let’s examine the code in more details: First, we initialize a weights & bias project (wandb.init(…)), which is used for logging intermediate training/evaluation …

Causal Inference 102

In my blog, I have covered several pieces of information about causal inference:  Causal Inference: we talked about (a) two-stage regression for estimating the causal effect between X and Y even when there is a confounder between them; (b) causal invariant prediction Tools needed to build an RL debugging tool: we talked about 3 main …

Reinfocement Learning in LLMs

In this post, we overview Reinforcement Learning techniques used in LLMs and alternative techniques that are often compared with RL techniques. PPO The PPO-based approach is the most famous RL approach. Detailed derivation of PPO and implementation tricks are introduced thoroughly in [2]. Especially, we want to call out their recommended implementation tricks: SLiC-HF SLiC-HF …

Llama code anatomy

This is the first time I have read llama2 code. Many things are still similar to the original transformer code, but there are also some new things. I am documenting some findings. Where is Llama2 Code? Modeling (training) code is hosted here: https://github.com/facebookresearch/llama/blob/main/llama/model.py Inference code is hosted here: https://github.com/facebookresearch/llama/blob/main/llama/generation.py Annotations There are two online annotations …

Dollar cost average on TQQQ vs QQQ [Real Data]

(Please cross reference to my previous post for simulation-based results: https://czxttkl.com/2023/01/15/dollar-cost-average-on-tqqq-vs-qqq/) In this post, we use real data (from 2021 april to 2024 jan) to show that even after a bear market (in 2022), DCA on TQQQ is still more profitable than QQQ. UPRO is also more profitable than SPY but the margin is not that …

Diffusion models

Diffusion models are popular these days. This blog [1] summarizes the comparison between diffusion models with other generative models: Before we go into the technical details, I want to use my own words to summarize my understanding in diffusion models. Diffusion models have two subprocesses: forward process and backward process. The forward process is non-learnable …

Mode collapse is real for generative models

I am very curious to see whether generative models like GAN and VAE can fit data of multi-modes. [1] has some overview over different generative models, mentioning that VAE has a clear probabilistic objective function and is more efficient. [2] showed that diffusion models (score-based generative models) can better fit multimode distribution than VAE and …

Causal Inference in Recommendation Systems

We have briefly touched some concepts of causal inference in [1, 2]. This post introduces some more specific works which apply causal inference in recommendation systems. Some works need to know the background of backdoor and frontdoor adjustments. So we will introduce them first. Backdoor and frontdoor adjustment  Suppose we have a causal graph like …

Dollar cost average on TQQQ vs QQQ [Simulation]

This post runs a simple simulation on using the Dollar-Cost-Average strategy to invest in QQQ vs. TQQQ, its 3x-leveraged ETF. In the simulation, QQQ will plunge 34% after 20 rounds. One round is a small up-and-down cycle – the index first moves up 1% then 3% down, until 34% down from the top. After reaching …