Before starting, please note that TabRepo is the largest collection of tabular benchmarking results that exists to our knowledge. Regenerating the TabRepo artifacts requires a minimum of 500,000 CPU hours of compute spread across 92,000 EC2 instances taking a minimum of 10 days.
To reproduce TabRepo, we must:
- Install AutoMLBenchmark
- Setup AWS credentials
- Edit the
custom_configs/config.yaml
file - Execute the
run_zeroshot.sh
script - Aggregate the results
- Re-run failed tasks
- Generate the TabRepo artifacts from the results
- Add the artifacts as a TabRepo Context
- Run TabRepo with the new Context
On a fresh python virtual environment using Python 3.9 to 3.11:
# Create a fresh venv
python -m venv venv
source venv/bin/activate
# Clone AutoMLBenchmark with TabRepo configs specified
# Make sure to do this when in the directory above the `tabrepo` project, `tabrepo` and `automlbenchmark` should exist side-by-side.
git clone https://github.com/Innixma/automlbenchmark.git --branch 2023_12_07
# Install AutoMLBenchmark (https://openml.github.io/automlbenchmark/docs/getting_started/)
cd automlbenchmark
pip install --upgrade pip
pip install -r requirements.txt
You are all set!
Due to the large amount of compute required to reproduce TabRepo, we will be using AWS.
We will need to ensure that boto3
recognizes your AWS credentials. We need this in order to spin up the AWS EC2 instances via AutoMLBenchmark's "aws" mode.
You will need to ensure that boto3
is working before moving to the next step.
The file can be found here: automlbenchmark/custom_configs/config.yaml
.
We need to edit this file so that it points to the correct bucket.
s3: # sub-namespace for AWS S3 service.
bucket: automl-benchmark-ag # must be unique im whole Amazon s3, max 40 chars, and include only numbers, lowercase characters and hyphens.
root_key: ec2/2023_12_07/ #
Replace the bucket
argument above in the config.yaml
file with a bucket you have created in your AWS account.
Note that this bucket must start with automl-benchmark
in the name, otherwise it won't work.
This is the location all output artifacts will be saved to. You can also optionally change the root_key, which is the directory within the bucket.
Note that bucket names are globally unique, so you will need to create a new one.
Note: Currently the below script will run for upwards of 10 days before all results are complete, using a significant amount of compute.
Theoretically, it could use 1,370,304 hours of on-demand m6i.2xlarge compute (~$430,000, not including storage costs) if all machines ran for the full time limit (in practice it is over an order of magnitude lower).
# In the root directory, where `tabrepo` and `automlbenchmark` exist
mkdir execute_tabrepo
cd execute_tabrepo
../tabrepo/scripts/execute_benchmarks/run_zeroshot.sh
Once the benchmark has fully finished, you can aggregate the results by running this script: https://github.com/Innixma/autogluon-benchmark/blob/master/scripts/aggregate_all.py
.
You will need to specify a version name matching the folder your results are saved to in S3, and specify aggregate_zeroshot=True
.
If you had transient failures and need to rerun failed tasks, we recommend contacting us for guidance, as this becomes non-trivial.
The TabRepo artifacts will automatically be generated when aggregating the results.
Please contact us to assist with adding your artifacts as a new TabRepo Context.
Now that your context has been added to TabRepo, simply specify the context name to load it as a EvaluationRepository:
from tabrepo import load_repository, EvaluationRepository
repo: EvaluationRepository = load_repository(context_name, cache=True)
repo.print_info()
You can also run the full suite of experiments with the following command:
!python tabrepo/scripts/baseline_comparison/evaluate_baselines.py --repo "{YOUR_CONTEXT_NAME}"