Recent advances in Batch RL

I’ll introduce some recent papers advancing batch RL. The first paper is Critic Regularized Regression [1]. It starts from a general form of actor-critic policy gradient objective function, where is a learned critic function: For a behavior cloning method, . However, we can do much more than that choice: The CRR paper tested the first two …

Some classical methodologies in applied products

I am reading two papers which uses very classical methodologies for optimizing metrics in real world applications. The first is constrained optimization for ranking, from The NodeHopper: Enabling Low Latency Ranking with Constraints via a Fast Dual Solver. The paper performs per-slate constrained optimization: Here, is item ‘s primary metric value, is item ‘s position after …