Reinforcement Learning Implementations

Implemented a Q-Learning algorithm on Grid World environment in which an agent (mouse) navigates her grid environment collecting rewards (cheese) with the goal of escaping the environment. The agent has the actions of Up, Down, Left and Right which allows her to move between states.
Implemented the Soft Actor-Critic model proposed by Haarnoja et al., (2018) to Open AI's Lunar Lander problem.
Implemented and carried out a comparative performance of Double DQN, Dueling DQN and Prioritised DQN for solving Lunar Lander problem.

Packages

Python Version: 3.8.11
Libraries and Packages: numpy , pandas, seaborn, operator, torch, gym, plotly , random , collections

Outcomes

Q-Learning Grid World

For our Q-Learning algorithmL, the random policy performs poorly and in an unpredictiable manner as expected:

For different values for Alpha (learning rate) , Gamma (the discount factor) and Epsilon (action selection):

Soft Actor-Critic (Deep Reinforcement Learning)

We found that the Soft Actor-Critic didn't perform very well in our implementation for the Lunar Lander problem. The agent behaved in a very stochastic manner and failed to adequately learn the rules of the game after many epochs:

Double DQN, Dueling DQN and Prioritised DQN (Deep Reinforcement Learning)

Lastly, out of the three DQN models the Prioritised DQN model performed best when considering the loss function. It is interesting to note that all three models achieved negative rewards across all epochs:

Specifications

Code and outcomes for the above can be found in the follow Jupyter notebooks:

Q-Learning Grid World: Q_Learning_Grid_World
Soft Actor-Critic: Soft_Actor_Critic
Double DQN, Dueling DQN and Prioritised DQN: DQN_comparison

For more information regarding Lunar Lander and OpenAI, see https://gym.openai.com/envs/LunarLander-v2/