Writing this post to share my notes on Trust Region Policy Optimization [2], Proximal Policy Optimization [3], and some recent works leveraging graph neural networks on RL problems. We start from the objective of TRPO. The expected return of a policy is . The return of another policy can be expressed as and a relative …
Monthly Archives: October 2019
Notes on “Recommending What Video to Watch Next: A Multitask Ranking System”
Share some thoughts on this paper: Recommending What Video to Watch Next: A Multitask Ranking System [1] The main contribution of this work is two parts: (1) a network architecture that learns on multiple objectives; (2) handles position bias in the same model To first contribution is achieved by “soft” shared layers. So each objective …
Continue reading “Notes on “Recommending What Video to Watch Next: A Multitask Ranking System””