- If you do not yet have a supercloud account, you will need to request one.
- After receiving an account, follow the instructions to ssh into supercloud.
- Once logged in to supercloud, create an ssh key and add it to your github.com account.
- The most important rule about using supercloud is that you should not run any intensive jobs on a login node. Instead, launch batch jobs, or use an interactive session. Both options are illustrated below.
Run the following commands to install this repository.
# Set `PYTHONHASHSEED` to 0 by default (see our README.md for explanation).
echo "export PYTHONHASHSEED=0" >> ~/.bashrc
# Clone the repository.
git clone [email protected]:Learning-and-Intelligent-Systems/predicators.git
# Set up conda with Python 3.9.
module unload anaconda
module load anaconda/2021b
conda create --name predicators python=3.9
conda init bash # will need to restart shell after this
conda activate predicators
# Install the predicators dependencies.
cd predicators
mkdir /state/partition1/user/$USER
export TMPDIR=/state/partition1/user/$USER
pip install -e .
# Add a shortcut for activating the conda env and switching to this repository.
echo -e "predicate() {\n cd ~/predicators\n conda activate predicators\n}" >> ~/.bashrc
# Add a shortcut for displaying running jobs.
echo "alias sl='squeue --format=\"%.18i %.9P %.42j %.8u %.8T %.10M %.6D %R\"'" >> ~/.bashrc
source ~/.bashrc
To test if it worked:
# Start an interactive session.
LLsub -i
# Activate conda and switch to the repository.
predicate
# Run a short experiment.
python predicators/main.py --env cover --approach oracle --seed 0
# Exit the interactive session.
exit
Note that supercloud sometimes hangs, so the experiment may take a few minutes to get started. But once it does, you should see 50/50 tasks solved, and the script should terminate in roughly 2 seconds (as reported at the bottom).
This is the preferred way to run experiments:
- Create a YAML config file in
scripts/configs/
. For example, seescripts/configs/example_basic.yaml
. - Create a pull request to add your new config file. Or, if you prefer to run an experiment before merging, make sure that your config file is pushed to a branch.
- To launch the experiment on supercloud, use
scripts/supercloud/launch.py
- If you are running a config that is already on master:
python scripts/supercloud/launch.py --user <your supercloud username> --config <name of your config file>
. For example:python scripts/supercloud/launch.py --user tslvr --config example_basic.yaml
- If you are running a config that is on a different branch, then add
--branch <branch name>
to the previous command.
- If you are running a config that is already on master:
- To download from supercloud, do
python scripts/supercloud/download.py --user <your supercloud username> --dir <local path>
. The results will be saved to--dir
.
We recommend running and monitoring experiments with the scripts described above. However, if you want more control, you can launch and monitor experiments manually on supercloud itself.
To get started, ssh into supercloud. Then, activate the conda environment and switch to the repository. If you followed the instructions above, you can do both with predicate
.
Before running any experiments, it is good practice to make sure that you have a clean workspace:
- Make sure that you have already backed up any old results that you want to keep.
- Remove all previous results:
rm -f results/* logs/* saved_approaches/* saved_datasets/*
. - Make sure you are on the right branch (
git branch
) with a clean diff (git diff
).
To run experiments defined in a YAML config, you can use the same launch.py
script described in the previous section, but with the added --on_supercloud
flag. For example: python scripts/supercloud/launch.py --user tslvr --config example_basic.yaml --on_supercloud
After experiments are running:
- To monitor experiments that are running, use
sl
. - To see individual logs, look in the
logs/
directory. - To cancel all jobs, use
scancel -u $USER
. - To see a summary of results so far, run
python scripts/analyze_results_directory.py
. - To download results onto your local machine, use
scp -r
. The most important directory to back up isresults/
, but we also recommend backing uplogs/
,saved_datasets/
, andsaved_approaches/
.
If any of the above steps do not work perfectly or lack clarity, please update this document with a pull request!