There are several online posts [1][2] that illustrate the idea of Transformer, the model introduced in the paper “attention is all you need” [4]. Based on [1] and [2], I am sharing a short tutorial for implementing Transformer [3]. In this tutorial, the task is “copy-paste”, i.e., to let a Transformer learn to output the …
Continue reading “Resources about Attention is all you need”