czxttkl – Page 11

Questions on Guided Policy Search

I’ve been reading Prof. Sergey Levine‘s paper on Guided Policy Search (GPS) [2]. However, I do not understand about it but want to have a record of my questions so maybe in the future I could look back and solve. Based on my understanding, traditional policy search (e.g., REINFORCE) maximizes the likelihood ratio of rewards. This …

Continue reading “Questions on Guided Policy Search”

Relationships between DP, RL, Prioritized Sweeping, Prioritized Experience Replay, etc

In the last weekend, I’ve struggled with many concepts in Reinforcement Learning (RL) and Dynamic Programming (DP). In this post, I am collecting some of my thoughts about DP, RL, Prioritized Sweeping and Prioritized Experience Replay. Please also refer to a previous post written when I first learned RL. Let’s first introduce a Markov Decision …

Continue reading “Relationships between DP, RL, Prioritized Sweeping, Prioritized Experience Replay, etc”

Time Series Walk Through

In this post, I am going to give a practicum walk through on time series analysis. All related code is available in a python notebook. The data we use is International airline passengers: monthly totals in thousands. which can be downloaded here as csv file (on the left panel, select Export->csv(,)). It is a univariate time …

Continue reading “Time Series Walk Through”

Experience Replay in Reinforcement Learning

https://www.google.com/search?q=episode+experience+replay&oq=episode+experience+replay&aqs=chrome..69i57.26352j0j7&sourceid=chrome&ie=UTF-8 https://www.google.com/search?q=experience+replay&oq=experience+rep&aqs=chrome.0.0j69i57j0l4.2129j0j7&sourceid=chrome&ie=UTF-8

tricks in deep learning neural network

In this post, I am going to talk my understanding in tricks in training deep neural network. ResNet [1] Why does ResNet network work? https://www.quora.com/How-does-deep-residual-learning-work Here is my answer: It is hard to know the desired depth of a deep network. If layers are too deep, errors are hard to propagate back correctly. if layers are …

Continue reading “tricks in deep learning neural network”

A3C code walkthrough

In this post, I am doing a brief code walkthrough for the code written in https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2 The code implements A3C algorithm (Asynchronous Methods for Deep Reinforcement Learning). It follows the pseudocode given in supplemental part in the paper: The structure of this model is: For LSTM structure detail, refer to http://colah.github.io/posts/2015-08-Understanding-LSTMs/. I am using the same …

Continue reading “A3C code walkthrough”

Policy Gradient

Reinforcement learning algorithms can be divided into many families. In model-free temporal difference methods like Q-learning/SARSA, we try to learn action value for any state-action pair, either by recording (“memorizing”) exact values in a tabular or learning a function to approximate it. Under -greedy, the action to be selected at a state will therefore be but there …

Continue reading “Policy Gradient”

Upgrade Cuda from 7.x to 8.0 on Ubuntu

1. remove cuda 7.x version (x depends on what you installed.) rm /usr/local/cuda-7.x 2. make sure PATH and LD_LIBRARY_PATH no longer contain “/usr/local/cuda-7.x”. Possible places to look at are /etc/environment, ~/.profile, /etc/bash.bashrc, /etc/profile, ~/.bash_rc If you really don’t know where cuda path is added to PATH or LD_LIBRARY_PATH, try to check here: https://unix.stackexchange.com/questions/813/how-to-determine-where-an-environment-variable-came-from 3. cuda 8.0 …

Continue reading “Upgrade Cuda from 7.x to 8.0 on Ubuntu”

Advanced Reinforcement Learning

Why TD($latex lambda$)? Why actor-critic? Why eligibility trace? Why contextual regret minimization?

English Grammars

“A” or “an” before an acronym or abbreviation? e.g., a FAQ or an FAQ? https://english.stackexchange.com/questions/1016/do-you-use-a-or-an-before-acronyms When should I add “the” before what kind of noun? http://www.englishteachermelanie.com/grammar-when-not-to-use-the-definite-article/ Whether to repeat “the” in “noun and noun” phrases? http://english.stackexchange.com/questions/9487/is-it-necessary-to-use-the-multiple-times “noun and noun” phrase: the following verb is plural or single? http://www.mhhe.com/mayfieldpub/tsw/nounsagr.htm adj before “noun …

Continue reading “English Grammars”