-
Implemented a Q-Learning algorithm on Grid World
environment
in which anagent
(mouse) navigates her gridenvironment
collectingrewards
(cheese) with the goal of escaping the environment. Theagent
has theactions
ofUp
,Down
,Left
andRight
which allows her to move betweenstates
. -
Implemented the Soft Actor-Critic model proposed by Haarnoja et al., (2018) to Open AI's Lunar Lander problem.
-
Implemented and carried out a comparative performance of Double DQN, Dueling DQN and Prioritised DQN for solving Lunar Lander problem.
-
Python Version:
3.8.11
-
Libraries and Packages:
numpy
,pandas
,seaborn
,operator
,torch
,gym
,plotly
,random
,collections
- For our Q-Learning algorithmL, the random policy performs poorly and in an unpredictiable manner as expected:
- For different values for
Alpha
(learning rate) ,Gamma
(the discount factor) andEpsilon
(action selection):
- We found that the Soft Actor-Critic didn't perform very well in our implementation for the Lunar Lander problem. The agent behaved in a very stochastic manner and failed to adequately learn the rules of the game after many epochs:
- Lastly, out of the three DQN models the Prioritised DQN model performed best when considering the loss function. It is interesting to note that all three models achieved negative rewards across all epochs:
Code and outcomes for the above can be found in the follow Jupyter notebooks:
- Q-Learning Grid World:
Q_Learning_Grid_World
- Soft Actor-Critic:
Soft_Actor_Critic
- Double DQN, Dueling DQN and Prioritised DQN:
DQN_comparison
For more information regarding Lunar Lander and OpenAI, see https://gym.openai.com/envs/LunarLander-v2/