Monthly Archives: January 2020
Hash table revisited
EmbeddingBag from PyTorch
EmbeddingBag in PyTorch is a useful feature to consume sparse ids and produce embeddings. Here is a minimal example. There are 4 ids’ embeddings, each of 3 dimensions. We have two data points, the first point has three ids (0, 1, 2) and the second point has the id (3). This is reflected in input …
Test with torch.multiprocessing and DataLoader
As we know PyTorch’s DataLoader is a great tool for speeding up data loading. Through my experience with trying DataLoader, I consolidated my understanding in Python multiprocessing. Here is a didactic code snippet: from torch.utils.data import DataLoader, Dataset import torch import time import datetime import torch.multiprocessing as mp num_batches = 110 print(“File init”) class DataClass: …
Continue reading “Test with torch.multiprocessing and DataLoader”
Indexing data on GPU
This correspond a question I asked on Pytorch forum. When we want to use indexing to extract data which is already on GPU, should indexing arrays better be on GPU as well? The answer is yes. Here is the evidence: I also created some other examples to show that if you are generating indexing arrays …