Gradient and Natural Gradient, Fisher Information Matrix and Hessian

Here I am writing down some notes summarizing my understanding in natural gradient. There are many online materials covering similar topics. I am not adding anything new but just doing personal summary. Assume we have a model with model parameter . We have training data . Then, the Hessian of log likelihood, , is:   …

Stochastic Variational Inference

Introduction In this post, we introduce one machine learning technique called stochastic variational inference that is widely used to estimate posterior distribution of Bayesian models. Suppose in a Bayesian model, the model parameters is denoted as a vector and the observation is denoted as . According to Bayesian theorem, the posterior distribution of can be …