EmbeddingBag in PyTorch is a useful feature to consume sparse ids and produce embeddings. Here is a minimal example. There are 4 ids’ embeddings, each of 3 dimensions. We have two data points, the first point has three ids (0, 1, 2) and the second point has the id (3). This is reflected in input …
Author Archives: czxttkl
Test with torch.multiprocessing and DataLoader
As we know PyTorch’s DataLoader is a great tool for speeding up data loading. Through my experience with trying DataLoader, I consolidated my understanding in Python multiprocessing. Here is a didactic code snippet: from torch.utils.data import DataLoader, Dataset import torch import time import datetime import torch.multiprocessing as mp num_batches = 110 print(“File init”) class DataClass: …
Continue reading “Test with torch.multiprocessing and DataLoader”
Indexing data on GPU
This correspond a question I asked on Pytorch forum. When we want to use indexing to extract data which is already on GPU, should indexing arrays better be on GPU as well? The answer is yes. Here is the evidence: I also created some other examples to show that if you are generating indexing arrays …
TRPO, PPO, Graph NN + RL
Writing this post to share my notes on Trust Region Policy Optimization [2], Proximal Policy Optimization [3], and some recent works leveraging graph neural networks on RL problems. We start from the objective of TRPO. The expected return of a policy is . The return of another policy can be expressed as and a relative …
Notes on “Recommending What Video to Watch Next: A Multitask Ranking System”
Share some thoughts on this paper: Recommending What Video to Watch Next: A Multitask Ranking System [1] The main contribution of this work is two parts: (1) a network architecture that learns on multiple objectives; (2) handles position bias in the same model To first contribution is achieved by “soft” shared layers. So each objective …
Continue reading “Notes on “Recommending What Video to Watch Next: A Multitask Ranking System””
Convergence of Q-learning and SARSA
Here, I am listing some classic proofs regarding the convergence of Q-learning and SARSA in finite MDPs (by definition, in finite Markov Decision Process the sets of states, actions and rewards are finite [1]). The very first Q-learning convergence proof comes from [4]. The proof is based on a very useful theorem: Note that this theorem is general to be …
Cross entropy with logits
I keep forgetting the exact formulation of `binary_cross_entropy_with_logits` in pytorch. So write this down for future reference. The function binary_cross_entropy_with_logits takes as two kinds of inputs: (1) the value right before the probability transformation (softmax) layer, whose range is (-infinity, +infinity); (2) the target, whose values are binary binary_cross_entropy_with_logits calculates the following loss (i.e., negative …
mujoco only works with gcc8
pip install mujoco-py would only build with gcc8. On Mac, use ll /usr/local/Cellar/gcc* to find all gcc versions you have installed. Uninstall them and only install gcc@8. Another time I saw the following error when using pip install mujoco-py: This error is suspected to be due to a corrupted gcc@8. I solved this by using …
Notes for “Defensive Quantization: When Efficiency Meets Robustness”
I have been reading “Defensive Quantization: When Efficiency Meets Robustness” recently. Neural network quantization is a brand-new topic to me so I am writing some notes down for learning. The first introduction I read is [1], from which I learn that the term “quantization” generally refers to reducing the memory usage of model weights by …
Continue reading “Notes for “Defensive Quantization: When Efficiency Meets Robustness””
Gradient and Natural Gradient, Fisher Information Matrix and Hessian
Here I am writing down some notes summarizing my understanding in natural gradient. There are many online materials covering similar topics. I am not adding anything new but just doing personal summary. Assume we have a model with model parameter . We have training data . Then, the Hessian of log likelihood, , is: …
Continue reading “Gradient and Natural Gradient, Fisher Information Matrix and Hessian”