Algorithm – Page 8

Relationships between DP, RL, Prioritized Sweeping, Prioritized Experience Replay, etc

In the last weekend, I’ve struggled with many concepts in Reinforcement Learning (RL) and Dynamic Programming (DP). In this post, I am collecting some of my thoughts about DP, RL, Prioritized Sweeping and Prioritized Experience Replay. Please also refer to a previous post written when I first learned RL. Let’s first introduce a Markov Decision …

Continue reading “Relationships between DP, RL, Prioritized Sweeping, Prioritized Experience Replay, etc”

Time Series Walk Through

In this post, I am going to give a practicum walk through on time series analysis. All related code is available in a python notebook. The data we use is International airline passengers: monthly totals in thousands. which can be downloaded here as csv file (on the left panel, select Export->csv(,)). It is a univariate time …

Continue reading “Time Series Walk Through”

Experience Replay in Reinforcement Learning

https://www.google.com/search?q=episode+experience+replay&oq=episode+experience+replay&aqs=chrome..69i57.26352j0j7&sourceid=chrome&ie=UTF-8 https://www.google.com/search?q=experience+replay&oq=experience+rep&aqs=chrome.0.0j69i57j0l4.2129j0j7&sourceid=chrome&ie=UTF-8

tricks in deep learning neural network

In this post, I am going to talk my understanding in tricks in training deep neural network. ResNet [1] Why does ResNet network work? https://www.quora.com/How-does-deep-residual-learning-work Here is my answer: It is hard to know the desired depth of a deep network. If layers are too deep, errors are hard to propagate back correctly. if layers are …

Continue reading “tricks in deep learning neural network”

A3C code walkthrough

In this post, I am doing a brief code walkthrough for the code written in https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2 The code implements A3C algorithm (Asynchronous Methods for Deep Reinforcement Learning). It follows the pseudocode given in supplemental part in the paper: The structure of this model is: For LSTM structure detail, refer to http://colah.github.io/posts/2015-08-Understanding-LSTMs/. I am using the same …

Continue reading “A3C code walkthrough”

Policy Gradient

Reinforcement learning algorithms can be divided into many families. In model-free temporal difference methods like Q-learning/SARSA, we try to learn action value for any state-action pair, either by recording (“memorizing”) exact values in a tabular or learning a function to approximate it. Under -greedy, the action to be selected at a state will therefore be but there …

Continue reading “Policy Gradient”

Advanced Reinforcement Learning

Why TD($latex lambda$)? Why actor-critic? Why eligibility trace? Why contextual regret minimization?

Importance sampling

Importance sampling is a way to reduce variance of your estimation on integration over a region for an integrand. Let’s first see how traditional Monte Carlo method is used to estimate integration [2]. To estimate $latex \int_a^b f(x) dx$, one can think of reshaping the area to be integrated as a rectangle, whose width is …

Continue reading “Importance sampling”

Inverse Reinforcement Learning

In my rough understanding, inverse reinforcement learning is a branch of RL research in which people try to perform state-action sequences resembling given tutor sequences. There are two famous works on inverse reinforcement learning. One is Apprenticeship Learning via Inverse Reinforcement Learning [1], and the other is Maximum Margin Planning [2]. Maximum Margin Planning In …

Continue reading “Inverse Reinforcement Learning”

Reinforcement learning overview

Here are some materials I found useful to learn Reinforcement Learning (RL). Let’s first look at Markov Decision Process (MDP), in which you know a transition function $latex T(s,a,s’)$ and a reward function $latex R(s,a,s’)$. In the diagram below, the green state is called “q state”. Some notations that need to be clarified: Dynamic programming …

Continue reading “Reinforcement learning overview”