This post curates some distributed RL research. We start from A3C [2], which we briefly covered a few years ago [3]. The core idea is that there is a central global network which is periodically synchronized with workers. Each worker copies the network parameters from the global network, runs through the environment and collects experiences, …