https://www.google.com/search?q=episode+experience+replay&oq=episode+experience+replay&aqs=chrome..69i57.26352j0j7&sourceid=chrome&ie=UTF-8 https://www.google.com/search?q=experience+replay&oq=experience+rep&aqs=chrome.0.0j69i57j0l4.2129j0j7&sourceid=chrome&ie=UTF-8
Category Archives: Algorithm
tricks in deep learning neural network
In this post, I am going to talk my understanding in tricks in training deep neural network. ResNet [1] Why does ResNet network work? https://www.quora.com/How-does-deep-residual-learning-work Here is my answer: It is hard to know the desired depth of a deep network. If layers are too deep, errors are hard to propagate back correctly. if layers are …
A3C code walkthrough
In this post, I am doing a brief code walkthrough for the code written in https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2 The code implements A3C algorithm (Asynchronous Methods for Deep Reinforcement Learning). It follows the pseudocode given in supplemental part in the paper: The structure of this model is: For LSTM structure detail, refer to http://colah.github.io/posts/2015-08-Understanding-LSTMs/. I am using the same …
Policy Gradient
Reinforcement learning algorithms can be divided into many families. In model-free temporal difference methods like Q-learning/SARSA, we try to learn action value for any state-action pair, either by recording (“memorizing”) exact values in a tabular or learning a function to approximate it. Under -greedy, the action to be selected at a state will therefore be but there …
Advanced Reinforcement Learning
Why TD($latex lambda$)? Why actor-critic? Why eligibility trace? Why contextual regret minimization?
Importance sampling
Importance sampling is a way to reduce variance of your estimation on integration over a region for an integrand. Let’s first see how traditional Monte Carlo method is used to estimate integration [2]. To estimate $latex \int_a^b f(x) dx$, one can think of reshaping the area to be integrated as a rectangle, whose width is …
Inverse Reinforcement Learning
In my rough understanding, inverse reinforcement learning is a branch of RL research in which people try to perform state-action sequences resembling given tutor sequences. There are two famous works on inverse reinforcement learning. One is Apprenticeship Learning via Inverse Reinforcement Learning [1], and the other is Maximum Margin Planning [2]. Maximum Margin Planning In …
Reinforcement learning overview
Here are some materials I found useful to learn Reinforcement Learning (RL). Let’s first look at Markov Decision Process (MDP), in which you know a transition function $latex T(s,a,s’)$ and a reward function $latex R(s,a,s’)$. In the diagram below, the green state is called “q state”. Some notations that need to be clarified: Dynamic programming …
Abstract Algebra
I am introducing some basic definitions of abstract algebra, structures like monoid, groups, rings, fields and vector spaces and homomorphism/isomorphism. I find the clear definitions of structures from [1]: Also, the tables below show a clear comparisons between several structures [2,3]: All these structures are defined with both a set and operation(s). Based on [4], …
When A* algorithm returns optimal solution
Dijkstra algorithm is a well known algorithm for finding exact distance from a source to a destination. In order to improve the path finding speed, A* algorithm combines heuristics and known distances to find the heuristically best path towards a goal. A common A* implementation maintains an open set for discovered yet not evaluated nodes and a closed …
Continue reading “When A* algorithm returns optimal solution”