Skip to content

Latest commit

 

History

History
54 lines (27 loc) · 2.59 KB

File metadata and controls

54 lines (27 loc) · 2.59 KB

Reinforcement Learning Implementations

  • Implemented a Q-Learning algorithm on Grid World environment in which an agent (mouse) navigates her grid environment collecting rewards (cheese) with the goal of escaping the environment. The agent has the actions of Up, Down, Left and Right which allows her to move between states.

  • Implemented the Soft Actor-Critic model proposed by Haarnoja et al., (2018) to Open AI's Lunar Lander problem.

  • Implemented and carried out a comparative performance of Double DQN, Dueling DQN and Prioritised DQN for solving Lunar Lander problem.

Packages

  • Python Version: 3.8.11

  • Libraries and Packages: numpy , pandas, seaborn, operator, torch, gym, plotly , random , collections

Outcomes

Q-Learning Grid World

  • For our Q-Learning algorithmL, the random policy performs poorly and in an unpredictiable manner as expected:

image

  • For different values for Alpha (learning rate) , Gamma (the discount factor) and Epsilon (action selection):

image

image

image

Soft Actor-Critic (Deep Reinforcement Learning)

  • We found that the Soft Actor-Critic didn't perform very well in our implementation for the Lunar Lander problem. The agent behaved in a very stochastic manner and failed to adequately learn the rules of the game after many epochs:

image

Double DQN, Dueling DQN and Prioritised DQN (Deep Reinforcement Learning)

  • Lastly, out of the three DQN models the Prioritised DQN model performed best when considering the loss function. It is interesting to note that all three models achieved negative rewards across all epochs:

image

Specifications

Code and outcomes for the above can be found in the follow Jupyter notebooks:

  • Q-Learning Grid World: Q_Learning_Grid_World
  • Soft Actor-Critic: Soft_Actor_Critic
  • Double DQN, Dueling DQN and Prioritised DQN: DQN_comparison

For more information regarding Lunar Lander and OpenAI, see https://gym.openai.com/envs/LunarLander-v2/