I’ve been reading Prof. Sergey Levine‘s paper on Guided Policy Search (GPS) [2]. However, I do not understand about it but want to have a record of my questions so maybe in the future I could look back and solve. Based on my understanding, traditional policy search (e.g., REINFORCE) maximizes the likelihood ratio of rewards. This …