Algorithm – Page 5 – czxttkl

Revisit Gaussian kernel

Optimization with discrete random variables

Control Variate

Personalized Re-ranking

Practical considerations of off-policy policy gradient

Constrained RL / Multi-Objective RL

Hash table revisited

TRPO, PPO, Graph NN + RL

Notes on “Recommending What Video to Watch Next: A Multitask Ranking System”

Convergence of Q-learning and SARSA