Category Archives: Algorithm
Control Variate
Personalized Re-ranking
Practical considerations of off-policy policy gradient
Constrained RL / Multi-Objective RL
Hash table revisited
TRPO, PPO, Graph NN + RL
Notes on “Recommending What Video to Watch Next: A Multitask Ranking System”
Convergence of Q-learning and SARSA
Cross entropy with logits
I keep forgetting the exact formulation of `binary_cross_entropy_with_logits` in pytorch. So write this down for future reference. The function binary_cross_entropy_with_logits takes as two kinds of inputs: (1) the value right before the probability transformation (softmax) layer, whose range is (-infinity, +infinity); (2) the target, whose values are binary binary_cross_entropy_with_logits calculates the following loss (i.e., negative …