Inverse Reinforcement Learning

In my rough understanding, inverse reinforcement learning is a branch of RL research in which people try to perform state-action sequences resembling given tutor sequences. There are two famous works on inverse reinforcement learning. One is Apprenticeship Learning via Inverse Reinforcement Learning [1], and the other is Maximum Margin Planning [2]. Maximum Margin Planning In …

Reinforcement learning overview

Here are some materials I found useful to learn Reinforcement Learning (RL). Let’s first look at Markov Decision Process (MDP), in which you know a transition function $latex T(s,a,s’)$ and a reward function $latex R(s,a,s’)$. In the diagram below, the green state is called “q state”.  Some notations that need to be clarified: Dynamic programming …

Abstract Algebra

I am introducing some basic definitions of abstract algebra, structures like monoid, groups, rings, fields and vector spaces and homomorphism/isomorphism. I find the clear definitions of structures from [1]: Also, the tables below show a clear comparisons between several structures [2,3]:   All these structures are defined with both a set and operation(s). Based on [4], …

When A* algorithm returns optimal solution

Dijkstra algorithm is a well known algorithm for finding exact distance from a source to a destination. In order to improve the path finding speed, A* algorithm combines heuristics and known distances to find the heuristically best path towards a goal. A common A* implementation maintains an open set for discovered yet not evaluated nodes and a closed …

Embedding and Heterogeneous Network Papers

Embedding methods have been widely used in graph, network, NLP and recommendation system. In short, embedding methods vectorize entities under study by mapping them into a shared latent space. Once vectorized representation of entities are learned (through either supervised or unsupervised fashion), a lot of knowledge discovery work can be done: clustering based on entity …

The expected times of tosses until you see first HTH or HTT

The problem comes from a very famous Ted Talk:  You are flipping a fair coin. What is the expected times of tosses you need to see the first “HTH” appears? What is that for the first “HTT” appears? Suppose $latex N_1$ is the random variable which counts the number of flips till we get first …

Why do we use Poisson distribution or Negative Binomial distribution for regression?

Let’s say we have predictor variables (features) denoted as $latex X \in \mathbb{R}^n$ and response variable (label) $latex y$ whose underlying random variable is $latex Y$. If we want to fit an Ordinary Least Square (OLS) regression such that $latex y=WX+\epsilon$ where $latex \epsilon$ is an error term, then we have the following assumptions:  strict exogenous: $latex E(\epsilon|X)=0$ …

Restricted Boltzmann Machine

In this post, I am going to share with you my understanding in Restricted Boltzmann Machine (RBM). Restricted Boltzmann Machine is a stochastic artificial neural network that learns the probability distribution of input. A stochastic artificial neural network means a structure contains a series of units with values between 0 to 1 that depend on …

Understand “Markov Chain Sampling Methods for Dirichlet Process Mixture Models”

In this post I am going to share my understanding of the paper: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. In chapter 2, it introduces the basic concept of Dirichlet Process Mixture Models. In (2.1), we have: $latex y_i | \theta_i \sim F(\theta_i) \newline \theta_i | G \sim G \newline G \sim DP(G_0, \alpha)$ …