I am introducing some basic definitions of abstract algebra, structures like monoid, groups, rings, fields and vector spaces and homomorphism/isomorphism. I find the clear definitions of structures from [1]: Also, the tables below show a clear comparisons between several structures [2,3]: All these structures are defined with both a set and operation(s). Based on [4], …
Category Archives: Algorithm
When A* algorithm returns optimal solution
Dijkstra algorithm is a well known algorithm for finding exact distance from a source to a destination. In order to improve the path finding speed, A* algorithm combines heuristics and known distances to find the heuristically best path towards a goal. A common A* implementation maintains an open set for discovered yet not evaluated nodes and a closed …
Continue reading “When A* algorithm returns optimal solution”
Embedding and Heterogeneous Network Papers
Embedding methods have been widely used in graph, network, NLP and recommendation system. In short, embedding methods vectorize entities under study by mapping them into a shared latent space. Once vectorized representation of entities are learned (through either supervised or unsupervised fashion), a lot of knowledge discovery work can be done: clustering based on entity …
Continue reading “Embedding and Heterogeneous Network Papers”
The expected times of tosses until you see first HTH or HTT
The problem comes from a very famous Ted Talk: You are flipping a fair coin. What is the expected times of tosses you need to see the first “HTH” appears? What is that for the first “HTT” appears? Suppose $latex N_1$ is the random variable which counts the number of flips till we get first …
Continue reading “The expected times of tosses until you see first HTH or HTT”
Why do we use Poisson distribution or Negative Binomial distribution for regression?
Let’s say we have predictor variables (features) denoted as $latex X \in \mathbb{R}^n$ and response variable (label) $latex y$ whose underlying random variable is $latex Y$. If we want to fit an Ordinary Least Square (OLS) regression such that $latex y=WX+\epsilon$ where $latex \epsilon$ is an error term, then we have the following assumptions: strict exogenous: $latex E(\epsilon|X)=0$ …
Restricted Boltzmann Machine
In this post, I am going to share with you my understanding in Restricted Boltzmann Machine (RBM). Restricted Boltzmann Machine is a stochastic artificial neural network that learns the probability distribution of input. A stochastic artificial neural network means a structure contains a series of units with values between 0 to 1 that depend on …
Understand “Markov Chain Sampling Methods for Dirichlet Process Mixture Models”
In this post I am going to share my understanding of the paper: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. In chapter 2, it introduces the basic concept of Dirichlet Process Mixture Models. In (2.1), we have: $latex y_i | \theta_i \sim F(\theta_i) \newline \theta_i | G \sim G \newline G \sim DP(G_0, \alpha)$ …
Continue reading “Understand “Markov Chain Sampling Methods for Dirichlet Process Mixture Models””
Read SAS output tables
The following tables were generated right after a simple linear regression with three independent variables was fit in SAS: The linear regression is Gallons_sold ~ price + line_ad + display. I will mainly illustrate how to read the first table. To give you a background, the number of samples is $latex n=406$ and the number …
Difference between SARSA and Q-learning
State-Action-Reward-State-Action (SARSA) and Q-learning are two forms of reinforcement learning. The difference of the two methods are discussed in: https://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/ http://stackoverflow.com/questions/6848828/reinforcement-learning-differences-between-qlearning-and-sarsatd http://stats.stackexchange.com/questions/184657/difference-between-off-policy-and-on-policy-learning Let’s explain why Q-learning is called off-policy learning and SARSA is called on-policy learning. Suppose at state $latex s_t$, a method takes action $latex a_t$ which results to land in a new state …
Why the greedy algorithm of maximum weighted matching is a 2-approximation?
This post explains my understanding in a proposed greedy algorithm for the maximum weighted matching problem. The greedy algorithm goes as follows (listed by this paper in Introduction section): It is claimed that the greedy algorithm is a 2 approximation, i.e., greedy result >= 1/2 optimal result. The document where the greedy algorithm is proposed is …
Continue reading “Why the greedy algorithm of maximum weighted matching is a 2-approximation?”