Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PPO + Transformer-XL #459

Merged
merged 37 commits into from
Sep 18, 2024
Merged

Add PPO + Transformer-XL #459

merged 37 commits into from
Sep 18, 2024

Conversation

MarcoMeter
Copy link
Collaborator

@MarcoMeter MarcoMeter commented Apr 22, 2024

Description

Implementation of PPO with Transformer-XL as episodic memory.
Based on this repo and paper.

Types of changes

  • New algorithm

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the tests accordingly (if applicable).
  • I have updated the documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
  • I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
  • I have performed RLops with python -m openrlbenchmark.rlops.
    • For new feature or bug fix:
      • I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
    • For new algorithm:
      • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
    • I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

Copy link

vercel bot commented Apr 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 18, 2024 4:49am

@MarcoMeter
Copy link
Collaborator Author

pre-commit

pre-commit fails because of two "obsolet" imports: memory_gym and PoMEnv. Without those imports, the environments are not registered inside gymnasium.

enjoy.py

I added a script to load a trained model and then watch an episode.

ProofofMemory-v0 and MiniGrid-MemoryS9-v0

These environments require memory and converge pretty fast. That's why I included those initially. MemoryGym environments take in more time and resources (especially GPU memory due to the cached hidden states of Transformer-XL).

TODO

I still have to run the benchmarks and write documentation. Besides that, the single file implementation is basically done. I tried to stay close to ppo_atari_lstm.py

…nused imports, however these imports are necessary for the used environments to be registered
@vwxyzjn
Copy link
Owner

vwxyzjn commented Sep 16, 2024

Keep or remove the Proof of Memory environment (cleanrl/ppo_trxl/pom_env.py)?

Feel free to keep it.

Do you know why the wandb chart looks like this?

image

Copy link
Owner

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very well-made PR. Thanks @MarcoMeter! Could you fix the test by just removing the macos-latest from the os: [ubuntu-22.04, macos-latest, windows-latest] in .github/workflows/tests.yaml? We can merge after that :)

@MarcoMeter
Copy link
Collaborator Author

MarcoMeter commented Sep 16, 2024

Do you know why the wandb chart looks like this?
image

What are you referring to? This is how I created the report:

@echo off
python -m openrlbenchmark.rlops ^
    --filters "?we=openrlbenchmark&wpn=cleanRL&ceik=env_id&cen=exp_name&metric=episode/r_mean" ^
    "ppo_trxl?cl=PPO-TrXL" ^
    --env-ids MortarMayhem-Grid-v0 MortarMayhem-v0 Endless-MortarMayhem-v0 MysteryPath-Grid-v0 MysteryPath-v0 Endless-MysteryPath-v0 SearingSpotlights-v0 Endless-SearingSpotlights-v0 ^
    --no-check-empty-runs ^
    --pc.ncols 3 ^
    --pc.ncols-legend 3 ^
    --rliable ^
    --rc.score_normalization_method maxmin ^
    --rc.normalized_score_threshold 1.0 ^
    --rc.sample_efficiency_plots ^
    --rc.sample_efficiency_and_walltime_efficiency_method Median ^
    --rc.performance_profile_plots ^
    --rc.aggregate_metrics_plots ^
    --rc.sample_efficiency_num_bootstrap_reps 10 ^
    --rc.performance_profile_num_bootstrap_reps 10 ^
    --rc.interval_estimates_num_bootstrap_reps 10 ^
    --output-filename memgym/compare ^
    --scan-history ^
    --report

Thanks for your feedback =)

@vwxyzjn
Copy link
Owner

vwxyzjn commented Sep 16, 2024

Oh I meant the error bar (shadow region) is very large for some reason, but it’s fine. I have added you to the list of contributors. Feel free to merge after CI passes.

@MarcoMeter
Copy link
Collaborator Author

It seems that other reports have this as well, like:
https://wandb.ai/openrlbenchmark/cleanrl/reports/CleanRL-PPG-vs-PPO-results--VmlldzoyMDY2NzQ5

…dded proper rendering to ProofofMemory-v0, updated docs for training and enjoying MiniGrid and ProofofMemory-v0
@MarcoMeter
Copy link
Collaborator Author

MarcoMeter commented Sep 17, 2024

I did some refinements:

  • Added hyperparameters to the docs for training MiniGrid-Memory-S9-v0 and ProofOfMemory-v0
  • Added pre-trained models to huggingface for these envs
  • ProofOfMemory-v0 can be adequately rendered now
  • Added link to ppo_trxl.py in README.md

My last step before merging is to make sure that poetry and the dependencies blend well.

@MarcoMeter
Copy link
Collaborator Author

My last step before merging is to make sure that poetry and the dependencies blend well.

Done.

@MarcoMeter MarcoMeter merged commit 9752b32 into vwxyzjn:master Sep 18, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants