In my rough understanding, inverse reinforcement learning is a branch of RL research in which people try to perform state-action sequences resembling given tutor sequences. There are two famous works on inverse reinforcement learning. One is Apprenticeship Learning via Inverse Reinforcement Learning [1], and the other is Maximum Margin Planning [2]. Maximum Margin Planning In …
Category Archives: Algorithm
Reinforcement learning overview
Here are some materials I found useful to learn Reinforcement Learning (RL). Let’s first look at Markov Decision Process (MDP), in which you know a transition function $latex T(s,a,s’)$ and a reward function $latex R(s,a,s’)$. In the diagram below, the green state is called “q state”. Some notations that need to be clarified: Dynamic programming …
Abstract Algebra
I am introducing some basic definitions of abstract algebra, structures like monoid, groups, rings, fields and vector spaces and homomorphism/isomorphism. I find the clear definitions of structures from [1]: Also, the tables below show a clear comparisons between several structures [2,3]: All these structures are defined with both a set and operation(s). Based on [4], …
When A* algorithm returns optimal solution
Dijkstra algorithm is a well known algorithm for finding exact distance from a source to a destination. In order to improve the path finding speed, A* algorithm combines heuristics and known distances to find the heuristically best path towards a goal. A common A* implementation maintains an open set for discovered yet not evaluated nodes and a closed …
Continue reading “When A* algorithm returns optimal solution”
Embedding and Heterogeneous Network Papers
Embedding methods have been widely used in graph, network, NLP and recommendation system. In short, embedding methods vectorize entities under study by mapping them into a shared latent space. Once vectorized representation of entities are learned (through either supervised or unsupervised fashion), a lot of knowledge discovery work can be done: clustering based on entity …
Continue reading “Embedding and Heterogeneous Network Papers”
The expected times of tosses until you see first HTH or HTT
The problem comes from a very famous Ted Talk: You are flipping a fair coin. What is the expected times of tosses you need to see the first “HTH” appears? What is that for the first “HTT” appears? Suppose $latex N_1$ is the random variable which counts the number of flips till we get first …
Continue reading “The expected times of tosses until you see first HTH or HTT”
Why do we use Poisson distribution or Negative Binomial distribution for regression?
Let’s say we have predictor variables (features) denoted as $latex X \in \mathbb{R}^n$ and response variable (label) $latex y$ whose underlying random variable is $latex Y$. If we want to fit an Ordinary Least Square (OLS) regression such that $latex y=WX+\epsilon$ where $latex \epsilon$ is an error term, then we have the following assumptions: strict exogenous: $latex E(\epsilon|X)=0$ …
Restricted Boltzmann Machine
In this post, I am going to share with you my understanding in Restricted Boltzmann Machine (RBM). Restricted Boltzmann Machine is a stochastic artificial neural network that learns the probability distribution of input. A stochastic artificial neural network means a structure contains a series of units with values between 0 to 1 that depend on …
Understand “Markov Chain Sampling Methods for Dirichlet Process Mixture Models”
In this post I am going to share my understanding of the paper: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. In chapter 2, it introduces the basic concept of Dirichlet Process Mixture Models. In (2.1), we have: $latex y_i | \theta_i \sim F(\theta_i) \newline \theta_i | G \sim G \newline G \sim DP(G_0, \alpha)$ …
Continue reading “Understand “Markov Chain Sampling Methods for Dirichlet Process Mixture Models””
Read SAS output tables
The following tables were generated right after a simple linear regression with three independent variables was fit in SAS: The linear regression is Gallons_sold ~ price + line_ad + display. I will mainly illustrate how to read the first table. To give you a background, the number of samples is $latex n=406$ and the number …