Learning a policy that can optimize multiple types of rewards or satisfy different constraints is a much desired feature in the industry. In real products, we often care about not only single one metric but several that interplay with each other. For example, we want to derive a policy to recommend news feeds which expects …
Monthly Archives: January 2020
Hash table revisited
I came across how Facebook implements Hash table from this post: https://engineering.fb.com/developer-tools/f14/. It introduces several techniques making modern hash tables more efficient. The first technique is called chunking, which reduces the time for resolving hash collision. The idea is to map keys to a chunk (a block of slots) rather than a single slot then …
EmbeddingBag from PyTorch
EmbeddingBag in PyTorch is a useful feature to consume sparse ids and produce embeddings. Here is a minimal example. There are 4 ids’ embeddings, each of 3 dimensions. We have two data points, the first point has three ids (0, 1, 2) and the second point has the id (3). This is reflected in input …
Test with torch.multiprocessing and DataLoader
As we know PyTorch’s DataLoader is a great tool for speeding up data loading. Through my experience with trying DataLoader, I consolidated my understanding in Python multiprocessing. Here is a didactic code snippet: from torch.utils.data import DataLoader, Dataset import torch import time import datetime import torch.multiprocessing as mp num_batches = 110 print(“File init”) class DataClass: …
Continue reading “Test with torch.multiprocessing and DataLoader”
Indexing data on GPU
This correspond a question I asked on Pytorch forum. When we want to use indexing to extract data which is already on GPU, should indexing arrays better be on GPU as well? The answer is yes. Here is the evidence: I also created some other examples to show that if you are generating indexing arrays …