Test with torch.multiprocessing and DataLoader

As we know PyTorch’s DataLoader is a great tool for speeding up data loading. Through my experience with trying DataLoader, I consolidated my understanding in Python multiprocessing. Here is a didactic code snippet: from torch.utils.data import DataLoader, Dataset import torch import time import datetime import torch.multiprocessing as mp num_batches = 110 print(“File init”) class DataClass: …

TRPO, PPO, Graph NN + RL

Writing this post to share my notes on Trust Region Policy Optimization [2], Proximal Policy Optimization [3], and some recent works leveraging graph neural networks on RL problems.  We start from the objective of TRPO. The expected return of a policy is . The return of another policy can be expressed as and a relative …

Notes on “Recommending What Video to Watch Next: A Multitask Ranking System”

Share some thoughts on this paper: Recommending What Video to Watch Next: A Multitask Ranking System [1] The main contribution of this work is two parts: (1) a network architecture that learns on multiple objectives; (2) handles position bias in the same model To first contribution is achieved by “soft” shared layers. So each objective …

Convergence of Q-learning and SARSA

Here, I am listing some classic proofs regarding the convergence of Q-learning and SARSA in finite MDPs (by definition, in finite Markov Decision Process the sets of states, actions and rewards are finite [1]). The very first Q-learning convergence proof comes from [4]. The proof is based on a very useful theorem: Note that this theorem is general to be …

Cross entropy with logits

I keep forgetting the exact formulation of `binary_cross_entropy_with_logits` in pytorch. So write this down for future reference. The function binary_cross_entropy_with_logits takes as two kinds of inputs: (1) the value right before the probability transformation (softmax) layer, whose range is (-infinity, +infinity); (2) the target, whose values are binary binary_cross_entropy_with_logits calculates the following loss (i.e., negative …

mujoco only works with gcc8

pip install mujoco-py would only build with gcc8. On Mac, use ll /usr/local/Cellar/gcc* to find all gcc versions you have installed. Uninstall them and only install gcc@8. Another time I saw the following error when using pip install mujoco-py: This error is suspected to be due to a corrupted gcc@8. I solved this by using …

Notes for “Defensive Quantization: When Efficiency Meets Robustness”

I have been reading “Defensive Quantization: When Efficiency Meets Robustness” recently. Neural network quantization is a brand-new topic to me so I am writing some notes down for learning.  The first introduction I read is [1], from which I learn that the term “quantization” generally refers to reducing the memory usage of model weights by …

Gradient and Natural Gradient, Fisher Information Matrix and Hessian

Here I am writing down some notes summarizing my understanding in natural gradient. There are many online materials covering similar topics. I am not adding anything new but just doing personal summary. Assume we have a model with model parameter . We have training data . Then, the Hessian of log likelihood, , is:   …