One of the concerns with currently existing ensemble selection algorithms is the size of the ensemble, which can affect the inference speed of the trained model. To address this issue, we introduce our novel approach: hardware-aware ensemble selection (HA-ES), which focuses on finding a balance in the performance and complexity trade-off inherent in ensembling.
To evaluate our approach and compare it to the existing algorithms, we use TabRepo, which provides prediction probabilities for over 100 ML problems. We use this data to efficiently evaluate and compare the ensemble selection techniques.
This set-up guide expects a Linux system. We ran the experiments on Python version 3.10.14.
It's good practice to use a virtual environment. This isolates your project dependencies from global Python installations. This is how you create a virtual environment in your project directory:
python3 -m venv venv
To activate it use:source venv/bin/activate
From the project root run the following commands to install the dependencies (-e
to automatically install changes made to the code of the dependencies)
pip install -r requirements.txt
python3 -m pip install -e extern/tabrepo
python3 -m pip install -e extern/phem
To run the experiments use
python3 haes/generate_data.py
If you use HA-ES in scientific publications, we would appreciate citations.
Maier, J., Möller, F., & Purucker, L. (2024). Hardware Aware Ensemble Selection for Balancing Predictive Accuracy and Cost. Paper presented at the Third International Conference on Automated Machine Learning (AutoML 2024) Workshop. arXiv. https://arxiv.org/abs/2408.02280