Here I am writing down some notes summarizing my understanding in natural gradient. There are many online materials covering similar topics. I am not adding anything new but just doing personal summary. Assume we have a model with model parameter . We have training data . Then, the Hessian of log likelihood, , is: …
Continue reading “Gradient and Natural Gradient, Fisher Information Matrix and Hessian”