Reward/return decomposition

In reinforcement learning (RL), it is common that a task only reveals rewards sparsely, e.g., at the end of an episode. This prevents RL algorithms from learning efficiently, especially when the task horizon is long. There has been some research on how to distribute sparse rewards to more preceding steps. One simple, interesting research is …

Self-Supervised Learning Tricks

I am reading some self-supervised learning papers. Some of them have interesting tricks to create self-supervised learning signals. This post is dedicated for those tricks. The first paper I read is SwAV(Swapping Assignments between multiple Views of the same image) [1]. The high level idea is that we create clusters with cluster centers . These …