Hovering a quacopter with some predefined position using gym-pybullet-drones env with PPO algorithm from PPO-PyTorch
- Refer to my recent project on drone racing in gym pybulet
- I test FlyThruGateAvitary environment with PPO with some modify in reward function. I created a gate model with Tinkercad to and add pybullet.
- To train:
python train_thrugate.py
, to test:python test_thrugate.py
- Recently, I figure out the frustration of drone at hover position may come from fixed
action_std
of this PPO implementation, they settingaction_std_init = 0.6
and decay this value during training time.In inference mode, there is no mechanism to reduce or remove this variance, so control output this vary all the time.I look at some other implementation of Soft Actor Critic, they use one more layer to learn action std beside action mean.
- Change reward function, compute terminate
- Follow author's guide to install gym-pybullet-drones environment
- Training
python train_hover.py
- Test pretrained model
python test_hover.py
- https://github.com/utiasDSL/gym-pybullet-drones/
- https://github.com/nikhilbarhate99/PPO-PyTorch
- Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
- https://web.stanford.edu/class/aa228/reports/2019/final62.pdf