DQN + Double Q-Learning + OpenAI Gym

Here I am providing a script to quickly experiment with the openai gym environment: https://github.com/czxttkl/Tutorials/tree/master/experiments/lunarlander. The script has the features of both Deep Q-Learning and Double Q-Learning.  

I ran my script to benchmark one open ai environment LunarLander-v2. The most stable version of the algorithm has following hyperparameters: no double q-learning (just use one q-network), gamma=0.99, batch size=64, learning rate=0.001 (for adam optimizer). It finishes around at 400 episodes:

Environment solved in 406 episodes!	Average Score: 200.06
Environment solved in 406 episodes!	Average Score: 200.06
Environment solved in 545 episodes!	Average Score: 200.62
Environment solved in 413 episodes!	Average Score: 200.35
Environment solved in 406 episodes!	Average Score: 200.06

 

Several insights:

1. gamma, the discounted factor, could have influence on results. For example, in LunarLander, if I set gamma to 1 instead of 0.9, the agent can never successfully learn, although LunarLander-v2 limits 1000 steps per episode.

 

2. double q learning does not bring in benefit, at least in LunarLander-v2. Output of turning on double q-learning, which is similar to the most stable version of the algorithm which doesn’t use double q-learning:

lunarlander + mse + double q + gamma 0.99
Environment solved in 435 episodes!	Average Score: 205.85
Environment solved in 435 episodes!	Average Score: 205.85
Environment solved in 491 episodes!	Average Score: 200.97
Environment solved in 406 episodes!	Average Score: 200.06
Environment solved in 413 episodes!	Average Score: 200.35

 

3. huber loss seems to even hurts. I change the loss function used in fitting q-learning to huber loss, but results are pretty bad. The agent never succeeds or only succeed after 1000 episodes.

 

References

[1] https://github.com/RMiftakhov/LunarLander-v2-drlnd/blob/master/dqn_agent.py

Leave a comment

Your email address will not be published. Required fields are marked *