shadowsocks + SwitchyOmega

We’ve introduced one way to proxy internet: https://czxttkl.com/?p=1265 Now we introduce another way to create proxy, which uses shadowsocks + SwitchyOmega (a chrome extension). Ubuntu: in a terminal:  sudo apt-get install shadowsocks-qt5 sudo add-apt-repository ppa:hzwhuang/ss-qt5 sudo apt-get update sudo apt-get install shadowsocks-qt5 open the installed shadowsocks and config a new connection:  install chrome extension SwitchyOmega: https://www.dropbox.com/s/i5xmrh4wv1fivg7/SwitchyOmega_Chromium.crx?dl=0 config …

Reinforcement Learning in Web Products

Reinforcement learning (RL) is an area of machine learning concerned with optimizing a notion of cumulative rewards. Although it has been applied in video game AI, robotics and control optimization for years, we have seen less of its presence in web products. In this post, I am going to introduce some works that apply RL in …

Questions on Guided Policy Search

I’ve been reading Prof. Sergey Levine‘s paper on Guided Policy Search (GPS) [2]. However, I do not understand about it but want to have a record of my questions so maybe in the future I could look back and solve. Based on my understanding, traditional policy search (e.g., REINFORCE) maximizes the likelihood ratio of rewards. This …

Relationships between DP, RL, Prioritized Sweeping, Prioritized Experience Replay, etc

In the last weekend, I’ve struggled with many concepts in Reinforcement Learning (RL) and Dynamic Programming (DP). In this post, I am collecting some of my thoughts about DP, RL, Prioritized Sweeping and Prioritized Experience Replay. Please also refer to a previous post written when I first learned RL. Let’s first introduce a Markov Decision …

tricks in deep learning neural network

In this post, I am going to talk my understanding in tricks in training deep neural network. ResNet [1] Why does ResNet network work? https://www.quora.com/How-does-deep-residual-learning-work Here is my answer: It is hard to know the desired depth of a deep network. If layers are too deep, errors are hard to propagate back correctly. if layers are …

A3C code walkthrough

In this post, I am doing a brief code walkthrough for the code written in https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2 The code implements A3C algorithm (Asynchronous Methods for Deep Reinforcement Learning). It follows the pseudocode given in supplemental part in the paper: The structure of this model is: For LSTM structure detail, refer to http://colah.github.io/posts/2015-08-Understanding-LSTMs/. I am using the same …

Policy Gradient

Reinforcement learning algorithms can be divided into many families. In model-free temporal difference methods like Q-learning/SARSA, we try to learn action value for any state-action pair, either by recording (“memorizing”) exact values in a tabular or learning a function to approximate it. Under -greedy, the action to be selected at a state will therefore be  but there …