In my rough understanding, inverse reinforcement learning is a branch of RL research in which people try to perform state-action sequences resembling given tutor sequences. There are two famous works on inverse reinforcement learning. One is Apprenticeship Learning via Inverse Reinforcement Learning [1], and the other is Maximum Margin Planning [2]. Maximum Margin Planning In …
Author Archives: czxttkl
Reinforcement learning overview
Here are some materials I found useful to learn Reinforcement Learning (RL). Let’s first look at Markov Decision Process (MDP), in which you know a transition function $latex T(s,a,s’)$ and a reward function $latex R(s,a,s’)$. In the diagram below, the green state is called “q state”. Some notations that need to be clarified: Dynamic programming …
Abstract Algebra
I am introducing some basic definitions of abstract algebra, structures like monoid, groups, rings, fields and vector spaces and homomorphism/isomorphism. I find the clear definitions of structures from [1]: Also, the tables below show a clear comparisons between several structures [2,3]: All these structures are defined with both a set and operation(s). Based on [4], …
My Pycharm keymap setting backup
Just to back up my keymap setting because I think they are super convenient to my usage style. https://www.dropbox.com/s/m14ozngs59jcd18/pycharm_keymap_settings.jar?dl=0
When A* algorithm returns optimal solution
Dijkstra algorithm is a well known algorithm for finding exact distance from a source to a destination. In order to improve the path finding speed, A* algorithm combines heuristics and known distances to find the heuristically best path towards a goal. A common A* implementation maintains an open set for discovered yet not evaluated nodes and a closed …
Continue reading “When A* algorithm returns optimal solution”
Install Tensorflow 0.12 with GPU support on AWS p2 instance
# for connection and file transfer ssh -i ~/Dropbox/research/aws_noisemodel_keypair.pem ubuntu@ec2-54-164-130-227.compute-1.amazonaws.com rsync –progress –delete -rave “ssh -i /home/czxttkl/Dropbox/research/aws_noisemodel_keypair.pem” /home/czxttkl/workspace/mymachinelearning/Python/LoLSynergyCounter ubuntu@ec2-54-164-130-227.compute-1.amazonaws.com:~/ sudo apt-get install python-pip python-dev pip install tensorflow-gpu download and transfer cuda toolkit, then install sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64.deb sudo apt-get update sudo apt-get install cuda download and transfer cudnn, then install: tar xvzf cudnn-<your-version>.tgz sudo …
Continue reading “Install Tensorflow 0.12 with GPU support on AWS p2 instance”
gmail filter OR condition
In order to achieve a filter rule with OR operation, you must specify conditions connected by the keyword “OR” (must be in upper cases). For example:
Install Python Package for User
If you do not have root privilege and want to install a python module, you can try the following approach: python setup.py install –user This will install packages into subdirectories of site.USER_BASE. To check what is the value of site.USER_BASE, use: import site print site.USER_BASE reference: https://docs.python.org/2/install/ Update 2018/01/06: using pip to …
Embedding and Heterogeneous Network Papers
Embedding methods have been widely used in graph, network, NLP and recommendation system. In short, embedding methods vectorize entities under study by mapping them into a shared latent space. Once vectorized representation of entities are learned (through either supervised or unsupervised fashion), a lot of knowledge discovery work can be done: clustering based on entity …
Continue reading “Embedding and Heterogeneous Network Papers”
The expected times of tosses until you see first HTH or HTT
The problem comes from a very famous Ted Talk: You are flipping a fair coin. What is the expected times of tosses you need to see the first “HTH” appears? What is that for the first “HTT” appears? Suppose $latex N_1$ is the random variable which counts the number of flips till we get first …
Continue reading “The expected times of tosses until you see first HTH or HTT”