📒 Description

This repo is designed to benchmark neural networks against various classic control and Mujoco environments. A few default networks have been included: Multilayer Perception (MLP), Long Short Term Memory (LSTM), and Neural Circuit Policy (NCP) networks.

🐳 Build Container

Definition File

This repo includes an Apptainer definition file located at containers/rl_benchmarks.def. You will need to log into a computer with root access to build this file. Execute the following:

sudo su -

cd containers
singularity build miniconda.sif miniconda.def
singularity build rl_benchmarks.sif rl_benchmarks.def

From there you can shell into the container to test the code:

singularity shell --tmp-sandbox --writable --nv rl_benchmarks.sif

Syndeo

The container should be used with Syndeo to launch parallel jobs on SLURM. A template for launching the parallel job using Syndeo can be found at containers/syndeo.sh. Use that as a launching pad to parallelize jobs on SLURM.

🛠️ Installing (Bare Metal)

conda create -n rl_benchmarks python=3.10
pip install poetry
poetry install --with=dev

You will need a custom installation of Mujoco 2.1.0 found here.

Mujoco requires some environmental variables to be set which can be added to your .bashrc:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/domi/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia

If you need a specific version of CUDA or CUDNN perform the proper installation from the PyTorch documentation.

🧪 Testing

To evaluate whether you have setup the install properly run all the tests after installation.

pytest

This will tell you exactly where the failures are.

🏝️ Environments

CartPole	Pendulum	Acrobot

Ant	Half Cheetah	Humanoid Standup

Humanoid	Inverted Double Pendulum	Inverted Pendulum

Pusher	Reacher	Swimmer

Classic Control Environments

CartPole-v1
Pendulum-v1
Acrobot-v1

Mujoco Environments

Ant-v4
HalfCheetah-v4
HumanoidStandup-v4
Humanoid-v4
InvertedDoublePendulum-v4
InvertedPendulum-v4
Reacher-v4
Swimmer-v4
Pusher-v4

Note

“All of these environments are stochastic in terms of their initial state, with a Gaussian noise added to a fixed initial state in order to add stochasticity.”

🚊 Running Training

To run the training pipeline first select the type of network you want to evaluate. Three different network types have been provided for you:

from src.models import CustomTorchModelCfc
from src.models import CustomTorchModelMlp
from src.models import CustomTorchModelLtc

To register the network you want to test set the following within main.py:

ModelCatalog.register_custom_model("my_torch_model", CustomTorchModelCfc)

To run the experiment execute:

python main.py

Outputs will be saved to the save/ folder. You can view them in Tensorboard using:

tensorboard --logdir=./save

If you are running on SLURM there is a script provided called grid_batch.sh. sbatch grid_batch.sh

⬛️ Tmux

If using tmux to do training here are some useful commands:

tmux new -s <session> # create a new session
tmux ls # list all sessions
tmux attach -t <session> # attach to session

Within the tmux window here are some useful commands:

control-b d # break from session
control-b [ # enter scrolling mode (q to quit)

🔧 Troubleshooting

Caution

The gymnasium package is very much in beta. There are bugs and problems with setting up the Mujoco environments. The current gymnasium==0.29.1 version doesn't support XML files even though it is documented on the website. For the Swimmer environment it states:

v3 and v4 take gymnasium.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale, etc.

The xml_file argument is not accepted and will throw an error. This means that in order to insert an XML file you need to install it into the site-packages of the virtual environment containing mujoco.

This repo has code that will replace the existing XML files for Mujoco, but the user should be aware. The developers are aware of the issue, but they don't plan to fix it until v1.0.0 release.

📔 Citations

The author acknowledges the MIT Lincoln Laboratory Supercomputing Center for providing (HPC, database, consultation) resources that have contributed to the research results reported within this paper/report.

@misc{towers_gymnasium_2023, title = {Gymnasium}, url = {https://zenodo.org/record/8127025}, abstract = {An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)}, urldate = {2023-07-08}, publisher = {Zenodo}, author = {Towers, Mark and Terry, Jordan K. and Kwiatkowski, Ariel and Balis, John U. and Cola, Gianluca de and Deleu, Tristan and Goulão, Manuel and Kallinteris, Andreas and KG, Arjun and Krimmel, Markus and Perez-Vicente, Rodrigo and Pierré, Andrea and Schulhoff, Sander and Tai, Jun Jet and Shen, Andrew Tan Jin and Younis, Omar G.}, month = mar, year = {2023}, doi = {10.5281/zenodo.8127026}, }

@inproceedings{reuther2018interactive, title={Interactive supercomputing on 40,000 cores for machine learning and data analysis}, author={Reuther, Albert and Kepner, Jeremy and Byun, Chansup and Samsi, Siddharth and Arcand, William and Bestor, David and Bergeron, Bill and Gadepally, Vijay and Houle, Michael and Hubbell, Matthew and Jones, Michael and Klein, Anna and Milechin, Lauren and Mullen, Julia and Prout, Andrew and Rosa, Antonio and Yee, Charles and Michaleas, Peter}, booktitle={2018 IEEE High Performance extreme Computing Conference (HPEC)}, pages={1--6}, year={2018}, organization={IEEE} }

♖ Distribution

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

This material is based upon work supported by the Department of the Air Force under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Department of the Air Force.

The software/firmware is provided to you on an As-Is basis

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.vscode		.vscode
containers		containers
docs/pics		docs/pics
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
SPDX.spdx		SPDX.spdx
main.py		main.py
noxfile.py		noxfile.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📒 Description

🐳 Build Container

Definition File

Syndeo

🛠️ Installing (Bare Metal)

🧪 Testing

🏝️ Environments

🚊 Running Training

⬛️ Tmux

🔧 Troubleshooting

📔 Citations

♖ Distribution

About

Releases

Packages

Contributors 3

Languages

License

mit-ll/RL-Benchmarks

Folders and files

Latest commit

History

Repository files navigation

📒 Description

🐳 Build Container

Definition File

Syndeo

🛠️ Installing (Bare Metal)

🧪 Testing

🏝️ Environments

🚊 Running Training

⬛️ Tmux

🔧 Troubleshooting

📔 Citations

♖ Distribution

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages