Introduction In this post, we introduce one machine learning technique called stochastic variational inference that is widely used to estimate posterior distribution of Bayesian models. Suppose in a Bayesian model, the model parameters is denoted as a vector and the observation is denoted as . According to Bayesian theorem, the posterior distribution of can be …
Author Archives: czxttkl
New code highlighter sample
1 2 3 def kljlkljadsklf(): if klaf: passdef kljlkljadsklf(): if klaf: pass
Bayesian linear regression
Ordinary least square (OLS) linear regression have point estimates on weight vector that fit the formula: . If we assume normality of the errors: with a fixed point estimate on , we could also enable analysis on confidence interval and future prediction (see discussion in the end of [2]). Instead of point estimates, bayesian linear …
Make PDFs search able
I just found a useful library that converts scanned, image-based pdfs into searchable pdfs. The library is named OCRmyPDF and can be found here: https://ocrmypdf.readthedocs.io/en/latest/installation.html#
Counterfactual Policy Evaluation
Evaluating trained RL policies offline is extremely important in real-world production: a trained policy with unexpected behaviors or unsuccessful learning would cause the system regress online therefore what safe to do is to evaluate their performance on the offline training data, based on which we decide whether to deploy. Evaluating policies offline is an ongoing research …
Resources about Attention is all you need
There are several online posts [1][2] that illustrate the idea of Transformer, the model introduced in the paper “attention is all you need” [4]. Based on [1] and [2], I am sharing a short tutorial for implementing Transformer [3]. In this tutorial, the task is “copy-paste”, i.e., to let a Transformer learn to output the …
Continue reading “Resources about Attention is all you need”
Implementation notes for world model
I’ve been recently implementing world model [1], which seems a promising algorithm to effectively learn controls after learning environments first. Here I share some implementation notes. Loss of Gaussian Mixture Model The memory model of world model is a Mixture-Density-Network Recurrent Neural Network (MDN-RNN). It takes current state and action as inputs, and outputs the …
Notes from Introduction to Calculus and Analysis
Cauchy-Schwarz inequality: $latex (a_1b_1 + a_2b_2 + \cdots + a_nb_n)^2 \leq (a_1^2 + a_2^2 + \cdots + a_n^2)(b_1^2+b_2^2 + \cdots + b_n^2)$ When $latex a_1=\sqrt{x}, a_2=\sqrt{y}, b_1=\sqrt{y}, b_2=\sqrt{x}$, then $latex (2\sqrt{xy})^2\leq (x+y)^2$
My understanding in 401K
Here is my reasoning about 401K. First, I’ll start with two definitions: (1) taxable income, meaning the gross income you receive on which your tax will be calculate; (2) tax deduction, meaning any deduction from your taxable income. Tax deduction lowers your taxable income thus lowers your tax in general. 401K has three categories: Pre-tax: contribute …
DPG and DDPG
In this post, I am sharing my understanding regarding Deterministic Policy Gradient Algorithm (DPG) [1] and its deep-learning version (DDPG) [2]. We have introduced policy gradient theorem in [3, 4]. Here, we briefly recap. The objective function of policy gradient methods is: $latex J(\theta)=\sum\limits_{s \in S} d^\pi(s) V^\pi(s)=\sum\limits_{s \in S} d^\pi(s) \sum\limits_{a \in A} \pi(a|s) Q^\pi(s,a), &s=2$ where …