Many ways towards recommendation diversity

Diversity of recommendations keeps users engaged and prevents boredom [1]. In this post, I will introduce several machine learning-based methods to achieve diverse recommendations. The literature in this post is mostly retrieved from the overview paper “Recent Advances in Diversified Recommendation” [6].  Determinant Point Process Let’s first review what is the determinant of a matrix …

Someone just saved this website: WordPress backup and Crayon Syntax Highlighter

It turned out that my old syntax highlighter “Crayon Syntax Highlighter” does not work with new PhP version (7.4+) and the plugin is no longer maintained officially. Luckily, someone updates it and provide a zip file: https://www.axew3.com/w3/forum/?coding=dmlld3RvcGljLnBocD9mPTEwJnQ9MTU1NQ==. I also back up the updated plugin on my dropbox: https://www.dropbox.com/s/e0u8jb2oqfagv9c/crayon-syntax-highlighter.zip?dl=0 BTW, I also back up the whole …

Correct font for subroutines in Latex

Today I learned that Latex supports many different typefaces: https://www.overleaf.com/learn/latex/font_typefaces and http://www.personal.ceu.hu/tex/typeface.htm I’ve seen papers like this one (https://arxiv.org/pdf/2006.15704.pdf) uses a different font for subroutines (like function names or APIs). One example is from their reference to the “CrossEntropyLoss” function: Now, I try by myself different typefaces and see which one it uses: I think …

Focal loss for classification and regression

I haven’t learnt any new loss function for a long time. Today I am going to learn one new loss function, focal loss, which was introduced in 2018 [1]. Let’s start from a typical classification task. For a data , where is the feature vector and is a binary label, a model predicts . Then …

Analyze DistributedDataParallel (DPP)’s behavior

DistributedDataParallel implements data parallelism at the module level which can run across different machines. There is one process running on each device where one copy of the module is held. Each process loads its own data which is non-overlapping with other processes’. At the initialization phase, all copies are synchronized to ensure they start from …

Tools needed to build an RL debugging tool

I’ve always had a dream to build a debugging tool to automatically analyze an observational dataset and tell me whether this dataset is suitable to apply batch-RL algorithms like DQN/DDPG. As we know, we cannot directly control or even know how observational dataset is collected. An incorrect data collection procedure or a wrong dataset would …

Noise-Contrastive Estimation

I encountered the noise-contrastive estimation (NCE) technique in several papers recently. So it is a good time to visit this topic. NCE is a commonly used technique under the word2vec umbrella [2]. Previously I talked about several other ways to generate word embeddings in [1] but skipped introducing NCE. In this post, I will introduce …

Control variate using Taylor expansion

We talked about control variate in [4]: when evaluating by Monte Carlo samples, we can instead evaluate with in order to reduce variance. The requirement for control variate to work is that is correlated with and the mean of is known.  In this post we will walk through a classic example of using control variate, …