I feel it is a fascinating perspective to view LLMs as compressors. Today, we are going to introduce the basic idea of it. We first use very layman terms to introduce what compression does. Compression can be seen as representing a stream of bits with a shorter stream of bits. It is based on assumption …
Author Archives: czxttkl
TQQQ/UPRO + volatility
In the past, we have tested TQQQ/UPRO on simulation data and real data. Today, I encountered an interesting video talking about using volatility indicators to decide when to hold leveraged ETFs. Here, I am just recording its link and its main result. We may come back and add more discussions in the future. …
More details in DPO
In this post, we dig into more details of Direct Preference Optimization [1], a popular method used in RLHF. First, we start from the normal RLHF objective that is typically used in PPO literature, which is equation 3 in the DPO paper [1]. Typically, we have input prompts and an LLM’s responses . The objective …
Minimal examples of HuggingFace LLM training
I’m sharing a minimal example of training an LLM model using HuggingFace’s libraries trl/transformers/evaluate/datasets/etc. The example is mainly borrowed from https://wandb.ai/capecape/alpaca_ft/reports/How-to-Fine-tune-an-LLM-Part-3-The-HuggingFace-Trainer–Vmlldzo1OTEyNjMy and its github repo https://github.com/tcapelle/llm_recipes/blob/main/scripts/train_hf.py. Here is the full file: Now let’s examine the code in more details: First, we initialize a weights & bias project (wandb.init(…)), which is used for logging intermediate training/evaluation …
Continue reading “Minimal examples of HuggingFace LLM training”
Causal Inference 102
In my blog, I have covered several pieces of information about causal inference: Causal Inference: we talked about (a) two-stage regression for estimating the causal effect between X and Y even when there is a confounder between them; (b) causal invariant prediction Tools needed to build an RL debugging tool: we talked about 3 main …
Reinfocement Learning in LLMs
In this post, we overview Reinforcement Learning techniques used in LLMs and alternative techniques that are often compared with RL techniques. PPO The PPO-based approach is the most famous RL approach. Detailed derivation of PPO and implementation tricks are introduced thoroughly in [2]. Especially, we want to call out their recommended implementation tricks: SLiC-HF SLiC-HF …
Llama code anatomy
This is the first time I have read llama2 code. Many things are still similar to the original transformer code, but there are also some new things. I am documenting some findings. Where is Llama2 Code? Modeling (training) code is hosted here: https://github.com/facebookresearch/llama/blob/main/llama/model.py Inference code is hosted here: https://github.com/facebookresearch/llama/blob/main/llama/generation.py Annotations There are two online annotations …
Improve reasoning for LLMs
LLMs have become the hottest topic in 2023, when I did not have much time to cover related topics. Let’s deep dive into this topic in the beginning of 2024. Prompts Using few-shots prompts to hint LLMs how to solve problems is the simplest form to improve reasoning for LLMs. When you first come across …
Dollar cost average on TQQQ vs QQQ [Real Data]
(Please cross reference to my previous post for simulation-based results: https://czxttkl.com/2023/01/15/dollar-cost-average-on-tqqq-vs-qqq/) In this post, we use real data (from 2021 april to 2024 jan) to show that even after a bear market (in 2022), DCA on TQQQ is still more profitable than QQQ. UPRO is also more profitable than SPY but the margin is not that …
Continue reading “Dollar cost average on TQQQ vs QQQ [Real Data]”
Diffusion models
Diffusion models are popular these days. This blog [1] summarizes the comparison between diffusion models with other generative models: Before we go into the technical details, I want to use my own words to summarize my understanding in diffusion models. Diffusion models have two subprocesses: forward process and backward process. The forward process is non-learnable …