In this post, I am going to give a practicum walk through on time series analysis. All related code is available in a python notebook. The data we use is International airline passengers: monthly totals in thousands. which can be downloaded here as csv file (on the left panel, select Export->csv(,)). It is a univariate time …
Author Archives: czxttkl
Experience Replay in Reinforcement Learning
https://www.google.com/search?q=episode+experience+replay&oq=episode+experience+replay&aqs=chrome..69i57.26352j0j7&sourceid=chrome&ie=UTF-8 https://www.google.com/search?q=experience+replay&oq=experience+rep&aqs=chrome.0.0j69i57j0l4.2129j0j7&sourceid=chrome&ie=UTF-8
tricks in deep learning neural network
In this post, I am going to talk my understanding in tricks in training deep neural network. ResNet [1] Why does ResNet network work? https://www.quora.com/How-does-deep-residual-learning-work Here is my answer: It is hard to know the desired depth of a deep network. If layers are too deep, errors are hard to propagate back correctly. if layers are …
A3C code walkthrough
In this post, I am doing a brief code walkthrough for the code written in https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2 The code implements A3C algorithm (Asynchronous Methods for Deep Reinforcement Learning). It follows the pseudocode given in supplemental part in the paper: The structure of this model is: For LSTM structure detail, refer to http://colah.github.io/posts/2015-08-Understanding-LSTMs/. I am using the same …
Policy Gradient
Reinforcement learning algorithms can be divided into many families. In model-free temporal difference methods like Q-learning/SARSA, we try to learn action value for any state-action pair, either by recording (“memorizing”) exact values in a tabular or learning a function to approximate it. Under -greedy, the action to be selected at a state will therefore be but there …
Upgrade Cuda from 7.x to 8.0 on Ubuntu
1. remove cuda 7.x version (x depends on what you installed.) rm /usr/local/cuda-7.x 2. make sure PATH and LD_LIBRARY_PATH no longer contain “/usr/local/cuda-7.x”. Possible places to look at are /etc/environment, ~/.profile, /etc/bash.bashrc, /etc/profile, ~/.bash_rc If you really don’t know where cuda path is added to PATH or LD_LIBRARY_PATH, try to check here: https://unix.stackexchange.com/questions/813/how-to-determine-where-an-environment-variable-came-from 3. cuda 8.0 …
Advanced Reinforcement Learning
Why TD($latex lambda$)? Why actor-critic? Why eligibility trace? Why contextual regret minimization?
English Grammars
“A” or “an” before an acronym or abbreviation? e.g., a FAQ or an FAQ? https://english.stackexchange.com/questions/1016/do-you-use-a-or-an-before-acronyms When should I add “the” before what kind of noun? http://www.englishteachermelanie.com/grammar-when-not-to-use-the-definite-article/ Whether to repeat “the” in “noun and noun” phrases? http://english.stackexchange.com/questions/9487/is-it-necessary-to-use-the-multiple-times “noun and noun” phrase: the following verb is plural or single? http://www.mhhe.com/mayfieldpub/tsw/nounsagr.htm adj before “noun …
Importance sampling
Importance sampling is a way to reduce variance of your estimation on integration over a region for an integrand. Let’s first see how traditional Monte Carlo method is used to estimate integration [2]. To estimate $latex \int_a^b f(x) dx$, one can think of reshaping the area to be integrated as a rectangle, whose width is …
Inverse Reinforcement Learning
In my rough understanding, inverse reinforcement learning is a branch of RL research in which people try to perform state-action sequences resembling given tutor sequences. There are two famous works on inverse reinforcement learning. One is Apprenticeship Learning via Inverse Reinforcement Learning [1], and the other is Maximum Margin Planning [2]. Maximum Margin Planning In …