czxttkl – czxttkl

Information Bottleneck + RL Exploration

View LLMs as compressors + Scaling laws

TQQQ/UPRO + volatility

More details in DPO

Minimal examples of HuggingFace LLM training

Causal Inference 102

Reinfocement Learning in LLMs

Llama code anatomy

Improve reasoning for LLMs

Dollar cost average on TQQQ vs QQQ [Real Data]