Embedding and Heterogeneous Network Papers

Embedding methods have been widely used in graph, network, NLP and recommendation system. In short, embedding methods vectorize entities under study by mapping them into a shared latent space. Once vectorized representation of entities are learned (through either supervised or unsupervised fashion), a lot of knowledge discovery work can be done: clustering based on entity …

The expected times of tosses until you see first HTH or HTT

The problem comes from a very famous Ted Talk:  You are flipping a fair coin. What is the expected times of tosses you need to see the first “HTH” appears? What is that for the first “HTT” appears? Suppose $latex N_1$ is the random variable which counts the number of flips till we get first …

Why do we use Poisson distribution or Negative Binomial distribution for regression?

Let’s say we have predictor variables (features) denoted as $latex X \in \mathbb{R}^n$ and response variable (label) $latex y$ whose underlying random variable is $latex Y$. If we want to fit an Ordinary Least Square (OLS) regression such that $latex y=WX+\epsilon$ where $latex \epsilon$ is an error term, then we have the following assumptions:  strict exogenous: $latex E(\epsilon|X)=0$ …

Restricted Boltzmann Machine

In this post, I am going to share with you my understanding in Restricted Boltzmann Machine (RBM). Restricted Boltzmann Machine is a stochastic artificial neural network that learns the probability distribution of input. A stochastic artificial neural network means a structure contains a series of units with values between 0 to 1 that depend on …

Understand “Markov Chain Sampling Methods for Dirichlet Process Mixture Models”

In this post I am going to share my understanding of the paper: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. In chapter 2, it introduces the basic concept of Dirichlet Process Mixture Models. In (2.1), we have: $latex y_i | \theta_i \sim F(\theta_i) \newline \theta_i | G \sim G \newline G \sim DP(G_0, \alpha)$ …

Difference between SARSA and Q-learning

State-Action-Reward-State-Action (SARSA) and Q-learning are two forms of reinforcement learning. The difference of the two methods are discussed in: https://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/ http://stackoverflow.com/questions/6848828/reinforcement-learning-differences-between-qlearning-and-sarsatd http://stats.stackexchange.com/questions/184657/difference-between-off-policy-and-on-policy-learning Let’s explain why Q-learning is called off-policy learning and SARSA is called on-policy learning. Suppose at state $latex s_t$, a method takes action $latex a_t$ which results to land in a new state …

Why the greedy algorithm of maximum weighted matching is a 2-approximation?

This post explains my understanding in a proposed greedy algorithm for the maximum weighted matching problem.  The greedy algorithm goes as follows (listed by this paper in Introduction section): It is claimed that the greedy algorithm is a 2 approximation, i.e., greedy result >= 1/2 optimal result. The document where the greedy algorithm is proposed is …

Theano LSTM Code Walk Through

In this post, I am going to explain the code (as much as I can) from theano LSTM tutorial: http://deeplearning.net/tutorial/lstm.html You need to first understand LSTM. Here is an online recommended material: http://colah.github.io/posts/2015-08-Understanding-LSTMs/, in which many beautiful figures are provided to illustrate LSTM step by step. The tutorial aims to predict positive/negative sentiment based on movie reviews …

Overview for Sequential Data Learning

Hidden Markov Model You should bear in mind clearly the three questions people usually ask for Hidden Markov Model: 1. what is the probability of an observed sequence?  2. what is the most likely series of states given a specific observed observation? 3. Given a set of observations, what are the values of the state …