Difference between SARSA and Q-learning

State-Action-Reward-State-Action (SARSA) and Q-learning are two forms of reinforcement learning. The difference of the two methods are discussed in: https://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/ http://stackoverflow.com/questions/6848828/reinforcement-learning-differences-between-qlearning-and-sarsatd http://stats.stackexchange.com/questions/184657/difference-between-off-policy-and-on-policy-learning Let’s explain why Q-learning is called off-policy learning and SARSA is called on-policy learning. Suppose at state $latex s_t$, a method takes action $latex a_t$ which results to land in a new state …

Why the greedy algorithm of maximum weighted matching is a 2-approximation?

This post explains my understanding in a proposed greedy algorithm for the maximum weighted matching problem.  The greedy algorithm goes as follows (listed by this paper in Introduction section): It is claimed that the greedy algorithm is a 2 approximation, i.e., greedy result >= 1/2 optimal result. The document where the greedy algorithm is proposed is …

Theano LSTM Code Walk Through

In this post, I am going to explain the code (as much as I can) from theano LSTM tutorial: http://deeplearning.net/tutorial/lstm.html You need to first understand LSTM. Here is an online recommended material: http://colah.github.io/posts/2015-08-Understanding-LSTMs/, in which many beautiful figures are provided to illustrate LSTM step by step. The tutorial aims to predict positive/negative sentiment based on movie reviews …

NLP datasets

Twitter Sentiment Analysis: http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/ Topic classification for news (including Reuters, NewsGroup): http://disi.unitn.it/moschitti/corpora.htm Movie reviews: http://www.cs.cornell.edu/People/pabo/movie-review-data/ Other reviews: http://www.text-analytics101.com/2011/07/user-review-datasets_20.html Twitter Evaluation dataset: http://tweenator.com/index.php?page_id=13 Amazon review: https://snap.stanford.edu/data/web-Amazon.html Amazon review (upon request): https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html opinmind: https://inclass.kaggle.com/c/si650winter11/data Large movie reviews: http://ai.stanford.edu/~amaas/data/sentiment/

Overview for Sequential Data Learning

Hidden Markov Model You should bear in mind clearly the three questions people usually ask for Hidden Markov Model: 1. what is the probability of an observed sequence?  2. what is the most likely series of states given a specific observed observation? 3. Given a set of observations, what are the values of the state …

Right way to put test codes in a Python project

I’ve been struggled about where to put test files in a python project for a long time. Ideally, I think it is succinct to create a folder called “test” with all test files in it. However, the test files nested in the test folder need to import modules from parent folder. It is troublesome to import Python module …

Jupyter Parallelism Tutorial

In this post, I am going to introduce my favorite way to make cells in Jupyter notebook run in parallel. 1. Initialize cluster using command lines or use Python `popen` (In the example below, I create a cluster with 2 workers): from subprocess import Popen p = Popen([‘ipcluster’, ‘start’, ‘-n’, ‘2’]) 2. Then, programmatically set …