Convergence of Q-learning and SARSA

Here, I am listing some classic proofs regarding the convergence of Q-learning and SARSA in finite MDPs (by definition, in finite Markov Decision Process the sets of states, actions and rewards are finite [1]). The very first Q-learning convergence proof comes from [4]. The proof is based on a very useful theorem: Note that this theorem is general to be …