Bayesian linear regression

Ordinary least square (OLS) linear regression have point estimates on weight vector that fit the formula: . If we assume normality of the errors: with a fixed point estimate on , we could also enable analysis on confidence interval and future prediction (see discussion in the end of [2]). Instead of point estimates, bayesian linear …

Resources about Attention is all you need

There are several online posts [1][2] that illustrate the idea of Transformer, the model introduced in the paper “attention is all you need” [4]. Based on [1] and [2], I am sharing a short tutorial for implementing Transformer [3]. In this tutorial, the task is “copy-paste”, i.e., to let a Transformer learn to output the …

Notes on “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor”

I am reading this paper (https://arxiv.org/abs/1801.01290) and wanted to take down some notes about it. Introduction Soft Actor-Critic is a special version of Actor-Critic algorithms. Actor-Critic algorithms are one kind of policy gradient methods. Policy gradient methods are different than value-based methods (like Q-learning), where you learn Q-values and then infer the best action to …

Euler’s Formula and Fourier Transform

Euler’s formula states that $latex e^{ix} =\cos{x}+ i \sin{x}$. When $latex x = \pi$, the formula becomes $latex e^{\pi} = -1$ known as Euler’s identity. An easy derivation of Euler’s formula is given in [3] and [5]. According to Maclaurin series (a special case of taylor expansion $latex f(x)=f(a)+f'(a)(x-a)+\frac{f”(a)}{2!}(x-a)^2+\cdots$ when $latex a=0$),  $latex e^x=1+x+\frac{x^2}{2!}+\frac{x^3}{3!}+\frac{x^4}{4!}+\cdots &s=2$ …

How to conduct grid search

I have always had some doubts on grid search. I am not sure how I should conduct grid search for hyperparameter tuning for a model and report the model’s generalization performance for a scientific paper. There are three possible ways: 1)  Split data into 10 folds. Repeat 10 times of the following: pick 9 folds as training data, …