Tools needed to facilitate long-term value optimization

[This post is inspired by one of my previous posts [4] which combs my thoughts on how to build a validation tool for RL problems. Now, I am trying to comb my thoughts for building tools to facilitate long-term value optimization.] There is one important topic in applied RL: long-term user value optimization. Usually, this means …