I’ve been reading Prof. Sergey Levine‘s paper on Guided Policy Search (GPS) [2]. However, I do not understand about it but want to have a record of my questions so maybe in the future I could look back and solve. Based on my understanding, traditional policy search (e.g., REINFORCE) maximizes the likelihood ratio of rewards. This …
Author Archives: czxttkl
Relationships between DP, RL, Prioritized Sweeping, Prioritized Experience Replay, etc
In the last weekend, I’ve struggled with many concepts in Reinforcement Learning (RL) and Dynamic Programming (DP). In this post, I am collecting some of my thoughts about DP, RL, Prioritized Sweeping and Prioritized Experience Replay. Please also refer to a previous post written when I first learned RL. Let’s first introduce a Markov Decision …
Time Series Walk Through
In this post, I am going to give a practicum walk through on time series analysis. All related code is available in a python notebook. The data we use is International airline passengers: monthly totals in thousands. which can be downloaded here as csv file (on the left panel, select Export->csv(,)). It is a univariate time …
Experience Replay in Reinforcement Learning
https://www.google.com/search?q=episode+experience+replay&oq=episode+experience+replay&aqs=chrome..69i57.26352j0j7&sourceid=chrome&ie=UTF-8 https://www.google.com/search?q=experience+replay&oq=experience+rep&aqs=chrome.0.0j69i57j0l4.2129j0j7&sourceid=chrome&ie=UTF-8
tricks in deep learning neural network
In this post, I am going to talk my understanding in tricks in training deep neural network. ResNet [1] Why does ResNet network work? https://www.quora.com/How-does-deep-residual-learning-work Here is my answer: It is hard to know the desired depth of a deep network. If layers are too deep, errors are hard to propagate back correctly. if layers are …
A3C code walkthrough
In this post, I am doing a brief code walkthrough for the code written in https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2 The code implements A3C algorithm (Asynchronous Methods for Deep Reinforcement Learning). It follows the pseudocode given in supplemental part in the paper: The structure of this model is: For LSTM structure detail, refer to http://colah.github.io/posts/2015-08-Understanding-LSTMs/. I am using the same …
Policy Gradient
Reinforcement learning algorithms can be divided into many families. In model-free temporal difference methods like Q-learning/SARSA, we try to learn action value for any state-action pair, either by recording (“memorizing”) exact values in a tabular or learning a function to approximate it. Under -greedy, the action to be selected at a state will therefore be but there …
Upgrade Cuda from 7.x to 8.0 on Ubuntu
1. remove cuda 7.x version (x depends on what you installed.) rm /usr/local/cuda-7.x 2. make sure PATH and LD_LIBRARY_PATH no longer contain “/usr/local/cuda-7.x”. Possible places to look at are /etc/environment, ~/.profile, /etc/bash.bashrc, /etc/profile, ~/.bash_rc If you really don’t know where cuda path is added to PATH or LD_LIBRARY_PATH, try to check here: https://unix.stackexchange.com/questions/813/how-to-determine-where-an-environment-variable-came-from 3. cuda 8.0 …
Advanced Reinforcement Learning
Why TD($latex lambda$)? Why actor-critic? Why eligibility trace? Why contextual regret minimization?
English Grammars
“A” or “an” before an acronym or abbreviation? e.g., a FAQ or an FAQ? https://english.stackexchange.com/questions/1016/do-you-use-a-or-an-before-acronyms When should I add “the” before what kind of noun? http://www.englishteachermelanie.com/grammar-when-not-to-use-the-definite-article/ Whether to repeat “the” in “noun and noun” phrases? http://english.stackexchange.com/questions/9487/is-it-necessary-to-use-the-multiple-times “noun and noun” phrase: the following verb is plural or single? http://www.mhhe.com/mayfieldpub/tsw/nounsagr.htm adj before “noun …