Multi Agent Reinforcement Learning for DIAMBRA Arena environments using TLeague framework
Steps of the next section will install all required packages, and in particular DIAMBRA Arena, a new Reinforcement Learning platform for research and experimentation. For doing so, some prerequisites are required, in particular:
-
Install Docker Desktop (Linux | Windows | MacOS) and make sure you have permissions to run it (see here). On Linux, it’s usually enough to run
sudo usermod -aG docker $USER
, log out and log back in. -
Download ROMs, set environment variables and test the installation as described in the installation section of the documentation here.
-
Create a virtual environment (e.g. using Conda or VirtualEnv) and activate it:
conda create -n diambra-tleague python=3.7 conda activate diambra-tleague
-
Clone the repository with its submodules:
git clone --recurse-submodules [email protected]:alexpalms/DIAMBRA-Arena-MARL-TLeague.git
-
From inside the repository, install all the submodules using pip:
- TGym:
cd TGym pip install -e .
- TArena:
cd TArena pip install -e .
- TPolicies:
cd TPolicies pip install -e .
- TLeague:
cd TLeague pip install -e .
- TGym:
-
Go the TLeague examples folder, from the repo root:
cd TLeague/examples/
-
Execute the four TLeague modules:
-
Model pool:
bash example_diambra_arena_sp_ppo.sh model_pool
-
League manager:
bash example_diambra_arena_sp_ppo.sh league_mgr
-
(A single) Learner:
diambra run bash example_diambra_arena_sp_ppo.sh learner
-
(As many) Actor(s as your system supports):
diambra run bash example_diambra_arena_sp_ppo.sh actor
-
Note that the Learners and the Actor(s), are launched using DIAMBRA Command Line Interface, that takes care of launching the docker container where the environment image is executed.
-
The following changes have been made to TLeague repository:
-
Created
tleague/envs/diambra_arena/
folder with DIAMBRA Arena interface, featuring:create_diambra_arena_envs.py
file implementing the methods to instantiate DIAMBRA Arena environment (with the proper TLeague interface) and to retrieve its Action and Observation Spaceswrapper.py
file containing an environment wrapper (TLeagueWrapper
) that builds the interface between DIAMBRA Arena and TLeaguecreate_diambra_arena_envs_test.py
file to test the proper functioning of DIAMBRA Arena TLeague interface__init__.py
file with generic module imports
-
Adapted
tleague/envs/create_envs.py
methods to accomodate DIAMBRA Arena environments interface -
Added required dependencies to
setup.py
to fix latest protobuf breaking changes, adding OpenCV and DIAMBRA Arena packages -
Added script to run all TLeague modules for DIAMBRA Arena at
examples/example_diambra_arena_sp_ppo.sh
, starting from the one used for Pong2D, with the following modifications:- Changed the max number of players to:
'max_n_players': 16
- Changed the environment identifier to:
env=diambra.arena_doapp
- Changed the max number of players to:
-
Added run instruction to docs at
docs/EXAMPLE_DIAMBRA_ARENA_SP_PPO.md
-
-
Original TPolicies repo has gym pinned to an old version
gym==0.12.1
, which doesn't feature the__getitem__(self, key)
forDict
observations. It has been added in the custom TGym which is derived from the 0.12.1 version and the correspondent dependency in TPoliciessetup.py
has been commented out.