Skip to content

Latest commit

 

History

History
44 lines (36 loc) · 2.46 KB

README.md

File metadata and controls

44 lines (36 loc) · 2.46 KB

HumanoidRobotWalk

Implementation of Trust Region Policy Optimization and Proximal Policy Optimization algorithms on the objective of Robot Walk.

Programs & libraries needed in order to run this project

  • OpenAI Gym : A toolkit for developing and comparing reinforcement learning algorithms
  • PyBullet Gym : PyBullet Robotics Environments fully compatible with Gym toolkit (uses the Bullet physics engine)
  • PyTorch : Open source machine learning library based on the Torch library
  • NumPy : Fundamental package for scientific computing with Python
  • matplotlib : Plotting library for the Python programming language and its numerical mathematics extension NumPy

Algorithms pseudocodes

Trust Region Policy Optimization (TRPO) - implemented by Vasilije Pantić

alt text

Proximal Policy Optimization (PPO) - implemented by Nikola Zubić

alt text

How to run?

For TRPO: Run trpo_main.py at root/code/trpo/,
For PPO: Run ppo_main.py at root/code/ppo/,
and enter the absolute file path to the trained model.

Trained models are available at: root/code/trained_models/.

In motion

TRPO

TRPO_in_motion

PPO

PPO_in_motion

Numerical results

Training time [h] 24 96
TRPO
Training time [h] 6.5 48
PPO
Click on image for full view.

Copyright (c) 2021 Nikola Zubić, Vasilije Pantić