Add PPO + Transformer-XL #459

MarcoMeter · 2024-04-22T08:06:19Z

Description

Implementation of PPO with Transformer-XL as episodic memory.
Based on this repo and paper.

Types of changes

New algorithm

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the tests accordingly (if applicable).
I have updated the documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

…steps

vercel · 2024-04-22T08:06:23Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 18, 2024 4:49am

MarcoMeter · 2024-04-22T08:44:40Z

pre-commit

pre-commit fails because of two "obsolet" imports: memory_gym and PoMEnv. Without those imports, the environments are not registered inside gymnasium.

enjoy.py

I added a script to load a trained model and then watch an episode.

ProofofMemory-v0 and MiniGrid-MemoryS9-v0

These environments require memory and converge pretty fast. That's why I included those initially. MemoryGym environments take in more time and resources (especially GPU memory due to the cached hidden states of Transformer-XL).

TODO

I still have to run the benchmarks and write documentation. Besides that, the single file implementation is basically done. I tried to stay close to ppo_atari_lstm.py

…initial to final value.

…nused imports, however these imports are necessary for the used environments to be registered

…yperparameters for succesfull MMG training

vwxyzjn · 2024-09-16T14:43:28Z

Keep or remove the Proof of Memory environment (cleanrl/ppo_trxl/pom_env.py)?

Feel free to keep it.

Do you know why the wandb chart looks like this?

vwxyzjn

This is very well-made PR. Thanks @MarcoMeter! Could you fix the test by just removing the macos-latest from the os: [ubuntu-22.04, macos-latest, windows-latest] in .github/workflows/tests.yaml? We can merge after that :)

MarcoMeter · 2024-09-16T16:56:27Z

Do you know why the wandb chart looks like this?

What are you referring to? This is how I created the report:

@echo off
python -m openrlbenchmark.rlops ^
    --filters "?we=openrlbenchmark&wpn=cleanRL&ceik=env_id&cen=exp_name&metric=episode/r_mean" ^
    "ppo_trxl?cl=PPO-TrXL" ^
    --env-ids MortarMayhem-Grid-v0 MortarMayhem-v0 Endless-MortarMayhem-v0 MysteryPath-Grid-v0 MysteryPath-v0 Endless-MysteryPath-v0 SearingSpotlights-v0 Endless-SearingSpotlights-v0 ^
    --no-check-empty-runs ^
    --pc.ncols 3 ^
    --pc.ncols-legend 3 ^
    --rliable ^
    --rc.score_normalization_method maxmin ^
    --rc.normalized_score_threshold 1.0 ^
    --rc.sample_efficiency_plots ^
    --rc.sample_efficiency_and_walltime_efficiency_method Median ^
    --rc.performance_profile_plots ^
    --rc.aggregate_metrics_plots ^
    --rc.sample_efficiency_num_bootstrap_reps 10 ^
    --rc.performance_profile_num_bootstrap_reps 10 ^
    --rc.interval_estimates_num_bootstrap_reps 10 ^
    --output-filename memgym/compare ^
    --scan-history ^
    --report

Thanks for your feedback =)

vwxyzjn · 2024-09-16T17:16:02Z

Oh I meant the error bar (shadow region) is very large for some reason, but it’s fine. I have added you to the list of contributors. Feel free to merge after CI passes.

MarcoMeter · 2024-09-16T17:20:30Z

It seems that other reports have this as well, like:
https://wandb.ai/openrlbenchmark/cleanrl/reports/CleanRL-PPG-vs-PPO-results--VmlldzoyMDY2NzQ5

…dded proper rendering to ProofofMemory-v0, updated docs for training and enjoying MiniGrid and ProofofMemory-v0

MarcoMeter · 2024-09-17T14:04:56Z

I did some refinements:

Added hyperparameters to the docs for training MiniGrid-Memory-S9-v0 and ProofOfMemory-v0
Added pre-trained models to huggingface for these envs
ProofOfMemory-v0 can be adequately rendered now
Added link to ppo_trxl.py in README.md

My last step before merging is to make sure that poetry and the dependencies blend well.

…docs

MarcoMeter · 2024-09-18T04:43:14Z

My last step before merging is to make sure that poetry and the dependencies blend well.

Done.

MarcoMeter added 12 commits April 15, 2024 08:58

initial commit of ppo trxl

990238e

removed video capture

513d2f8

Switched from grayscale to RGB

cd596c8

RGB obs reconstruction

cab8e07

print reconstruction loss instead of total loss

2a012eb

Ensure that transformer memory length is not larger than max episode …

2efe26a

…steps

fixed enjoy.py in the case of Searing Spotlights

8737485

added video capture support again after updating memory gym

5ef2e07

default hyperparameters

4f88baa

remove unnecessary padding from TrXL memory, if applicable

9974f20

print SPS

7761993

updated pyproject.toml because of memory-gym 1.0.2

5374027

vercel bot deployed to Preview April 22, 2024 08:06 View deployment

fixed comment

fe85e75

vercel bot deployed to Preview April 22, 2024 08:41 View deployment

MarcoMeter mentioned this pull request Apr 24, 2024

Any usage of poetry after installation: No module named 'tomli' #455

Open

3 tasks

refactored code + added a comment

eb36a32

vercel bot deployed to Preview April 24, 2024 12:46 View deployment

added annealing entropy coefficient. learning rate anneals also from …

2fe62d4

…initial to final value.

vercel bot deployed to Preview April 25, 2024 05:53 View deployment

aligned monitoring of losses and further metrics

86e59ef

vercel bot deployed to Preview May 27, 2024 06:59 View deployment

slight adjustments due to pre-commit, pre-commit still fails due to u…

62fea9d

…nused imports, however these imports are necessary for the used environments to be registered

vercel bot deployed to Preview May 27, 2024 07:04 View deployment

heads share parameters again for reproduction purposes, set default h…

3297da4

…yperparameters for succesfull MMG training

vercel bot deployed to Preview June 18, 2024 09:14 View deployment

fixed entropy for multi-discrete action spaces

8ff1edf

vercel bot deployed to Preview June 25, 2024 09:50 View deployment

pre-commit enjoy.py

e9c2f54

vercel bot deployed to Preview September 10, 2024 08:40 View deployment

add #noqa to fix pre-commit

2b13995

vercel bot deployed to Preview September 10, 2024 08:53 View deployment

fix pre-commit #noqa

10ecb84

vercel bot deployed to Preview September 10, 2024 08:55 View deployment

pre-commit fixed enjoy import order

71be7ee

vercel bot deployed to Preview September 10, 2024 08:59 View deployment

last pass of pre-commit

e72f25c

vercel bot deployed to Preview September 10, 2024 09:02 View deployment

fixed spelling

4a46f71

vercel bot deployed to Preview September 10, 2024 09:08 View deployment

vwxyzjn approved these changes Sep 16, 2024

View reviewed changes

remove macos-latest from .github/workflows/test.yaml

40cb39f

vercel bot deployed to Preview September 16, 2024 17:10 View deployment

Added ppo_trxl to README.md, fixed enjoy and ppo_trxl for MiniGrid, a…

2545864

…dded proper rendering to ProofofMemory-v0, updated docs for training and enjoying MiniGrid and ProofofMemory-v0

vercel bot deployed to Preview September 17, 2024 13:18 View deployment

pre-commit fixes

c25ba62

vercel bot deployed to Preview September 17, 2024 13:28 View deployment

Add requirements.txt, update poetry, torch defaults to CUDA, updated …

d3d6485

…docs

vercel bot deployed to Preview September 18, 2024 04:35 View deployment

updated doc links, added memory gym requirements to README.md

a7e708a

vercel bot deployed to Preview September 18, 2024 04:49 View deployment

MarcoMeter merged commit 9752b32 into vwxyzjn:master Sep 18, 2024
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PPO + Transformer-XL #459

Add PPO + Transformer-XL #459

MarcoMeter commented Apr 22, 2024 •

edited

Loading

vercel bot commented Apr 22, 2024 •

edited

Loading

MarcoMeter commented Apr 22, 2024

vwxyzjn commented Sep 16, 2024

vwxyzjn left a comment

MarcoMeter commented Sep 16, 2024 •

edited

Loading

vwxyzjn commented Sep 16, 2024

MarcoMeter commented Sep 16, 2024

MarcoMeter commented Sep 17, 2024 •

edited

Loading

MarcoMeter commented Sep 18, 2024

Add PPO + Transformer-XL #459

Add PPO + Transformer-XL #459

Conversation

MarcoMeter commented Apr 22, 2024 • edited Loading

Description

Types of changes

Checklist:

vercel bot commented Apr 22, 2024 • edited Loading

MarcoMeter commented Apr 22, 2024

pre-commit

enjoy.py

ProofofMemory-v0 and MiniGrid-MemoryS9-v0

TODO

vwxyzjn commented Sep 16, 2024

vwxyzjn left a comment

Choose a reason for hiding this comment

MarcoMeter commented Sep 16, 2024 • edited Loading

vwxyzjn commented Sep 16, 2024

MarcoMeter commented Sep 16, 2024

MarcoMeter commented Sep 17, 2024 • edited Loading

MarcoMeter commented Sep 18, 2024

MarcoMeter commented Apr 22, 2024 •

edited

Loading

vercel bot commented Apr 22, 2024 •

edited

Loading

MarcoMeter commented Sep 16, 2024 •

edited

Loading

MarcoMeter commented Sep 17, 2024 •

edited

Loading