Implementation of a reinforcement learning model for the strategy game Blokus
This repository requires CUDA to be installed. You can find installation instructions here. To check if CUDA is installed correctly, run the following command:
nvidia-smi
This repository requires Conda to be installed. You can find installation instructions here. To check if Conda is installed correctly, run the following command:
conda --version
To create a Conda environment with all required packages, run the following command:
conda env create -n blokus-rl python=3.10
To activate the environment, run the following command:
conda activate blokus-rl
To install the required packages, run the following command:
git clone https://github.com/KubiakJakub01/Blokus-RL.git
cd Blokus-RL
pip install -e .[dev]
Alternatively, you can use Docker to run this repository. To build a Docker image, run the following command from the root directory of this repository:
docker build -t blokus-rl .
When running the Docker image remember to mount proper directories with config files, logs, and checkpoints. For example, to run a sample PPO training in Blokus $7$x$7$ environment, you can use the following command:
docker run -it --rm \
-v $(pwd)/config:/app/config \
blokus-rl \
python -m blokus_rl.train \
--hparams_fp config/ppo_blokus_7x7.yml \
--algorithm ppo
This repository allows you to train a reinforcement learning model to play Blokus and other games using the OpenAI Gym interface. Currently, there are two algorithms implemented: PPO and AlphaZero. Before training, you need to create a file with hparams
. You can find an example in config
directory. For more information about hyperparameters, see blokus_rl/hparams.py
there are described all hyperparameters for both algorithms. To train a model, run the following command:
python -m blokus_rl.train \
--hparams_fp <path_to_config_file> \
--algorithm <ppo|alphazero>
For running sample PPO training in Blokus $7$x$7$ environment, you can use the following command:
python -m blokus_rl.train \
--hparams_fp config/ppo_blokus_7x7.yml \
--algorithm ppo
For running sample AlphaZero training in Blokus $20$x$20$ environment, you can use the following command:
python -m blokus_rl.train \
--hparams_fp config/alphazero_blokus_20x20.yml \
--algorithm alphazero
The training logs and checkpoints will be saved as specified in the config file directory. The default directory is models/checkpoints
and models/logs
respectively. To see the training progress, run the following command:
tensorboard --logdir <path_to_logs_directory>
To play a game between different players, use a compare arena. You can specify path to hparams of alphazero
player or choose between those players' mcts
, random
or human
. For more information, run the following command:
python -m blokus_rl.compare_arena -h
example:
python -m blokus_rl.compare_arena --players config/mcts_blokus.yml random mcts human