State-Action-Reward-State-Action (SARSA) and Q-learning are two forms of reinforcement learning. The difference of the two methods are discussed in: https://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/ http://stackoverflow.com/questions/6848828/reinforcement-learning-differences-between-qlearning-and-sarsatd http://stats.stackexchange.com/questions/184657/difference-between-off-policy-and-on-policy-learning Let’s explain why Q-learning is called off-policy learning and SARSA is called on-policy learning. Suppose at state $latex s_t$, a method takes action $latex a_t$ which results to land in a new state …