Minimal examples of HuggingFace LLM training

I’m sharing a minimal example of training an LLM model using HuggingFace’s libraries trl/transformers/evaluate/datasets/etc. The example is mainly borrowed from https://wandb.ai/capecape/alpaca_ft/reports/How-to-Fine-tune-an-LLM-Part-3-The-HuggingFace-Trainer–Vmlldzo1OTEyNjMy and its github repo https://github.com/tcapelle/llm_recipes/blob/main/scripts/train_hf.py. Here is the full file: Now let’s examine the code in more details: First, we initialize a weights & bias project (wandb.init(…)), which is used for logging intermediate training/evaluation …

Dollar cost average on TQQQ vs QQQ [Real Data]

(Please cross reference to my previous post for simulation-based results: https://czxttkl.com/2023/01/15/dollar-cost-average-on-tqqq-vs-qqq/) In this post, we use real data (from 2021 april to 2024 jan) to show that even after a bear market (in 2022), DCA on TQQQ is still more profitable than QQQ. UPRO is also more profitable than SPY but the margin is not that …

Dollar cost average on TQQQ vs QQQ [Simulation]

This post runs a simple simulation on using the Dollar-Cost-Average strategy to invest in QQQ vs. TQQQ, its 3x-leveraged ETF. In the simulation, QQQ will plunge 34% after 20 rounds. One round is a small up-and-down cycle – the index first moves up 1% then 3% down, until 34% down from the top. After reaching …

PyTorch Lightning template

Back to the old days, I’ve studied how to implement highly efficient PyTorch pipelines for multi-gpu training [1]. DistributedDataParallel is the way to go, but it is cumbersome that we need boilerplates for spawning workers and constructing data readers. Now, PyTorch Lighting offers clean API for setting up multi-gpu training easily. Here is a template …

Analyze DistributedDataParallel (DPP)’s behavior

DistributedDataParallel implements data parallelism at the module level which can run across different machines. There is one process running on each device where one copy of the module is held. Each process loads its own data which is non-overlapping with other processes’. At the initialization phase, all copies are synchronized to ensure they start from …

Test with torch.multiprocessing and DataLoader

As we know PyTorch’s DataLoader is a great tool for speeding up data loading. Through my experience with trying DataLoader, I consolidated my understanding in Python multiprocessing. Here is a didactic code snippet: from torch.utils.data import DataLoader, Dataset import torch import time import datetime import torch.multiprocessing as mp num_batches = 110 print(“File init”) class DataClass: …