Skip to content

Commit

Permalink
Add comparison to other packages
Browse files Browse the repository at this point in the history
  • Loading branch information
shuds13 committed Oct 19, 2023
1 parent 0157655 commit ff46495
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 4 deletions.
37 changes: 37 additions & 0 deletions docs/papers/joss/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,40 @@ @article{Pousa22
url = {https://accelconf.web.cern.ch/ipac2022/papers/wepost030.pdf},
language = {english}
}

@INPROCEEDINGS{colmena21,
author={Ward, Logan and Sivaraman, Ganesh and Pauloski, J. Gregory and Babuji, Yadu and Chard, Ryan and Dandu, Naveen and Redfern, Paul C. and Assary, Rajeev S. and Chard, Kyle and Curtiss, Larry A. and Thakur, Rajeev and Foster, Ian},
booktitle={2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)},
title={Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing},
year={2021},
pages={9-20},
doi={10.1109/MLHPC54614.2021.00007}}

@INPROCEEDINGS{ensembletoolkit16,
author={Balasubramanian, Vivekanandan and Treikalis, Antons and Weidner, Ole and Jha, Shantenu},
booktitle={2016 45th International Conference on Parallel Processing (ICPP)},
title={Ensemble Toolkit: Scalable and Flexible Execution of Ensembles of Tasks},
year={2016},
volume={},
number={},
pages={458-463},
doi={10.1109/ICPP.2016.59}}

@inproceedings{parsl,
author={Babuji, Yadu and Woodard, Anna and Li, Zhuozhao and Katz, Daniel S. and Clifford, Ben and
Kumar, Rohan and Lacinski, Lukasz and Chard, Ryan and Wozniak, Justin and Foster, Ian and
Wilde, Mike and Chard, Kyle},
title = {Parsl: Pervasive Parallel Programming in Python},
booktitle = {Proc.\ HPDC19},
doi = {10.1145/3307681.3325400},
year = {2019},
}

@inproceedings{Salim2019,
doi = {10.1109/xloop49562.2019.00010},
year = {2019},
publisher = {{IEEE}},
author = {Michael Salim and Thomas Uram and J. Taylor Childers and Venkatram Vishwanath and Michael Papka},
title = {Balsam: Near Real-Time Experimental Data Analysis on Supercomputers},
booktitle = {{IEEE}/{ACM} 1st Annual Workshop on Large-scale Experiment-in-the-Loop Computing ({XLOOP})}
}
10 changes: 6 additions & 4 deletions docs/papers/joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ design, decision, and inference studies on or across laptops and heterogeneous h

# Statement of need

There are a growing number of packages aimed at workflows, and a sub-set of these focus on running ensembles of calculations on clusters and supercomputers. A dynamic ensemble refers to packages that automatically steer the ensemble based on intermediate results. This may involve deciding simulation parameters based on numerical optimization or machine learning techniques, among other possibilities. Other packages in this space include Colmena and the RADICAL-Ensemble Toolkit.
There are a growing number of packages aimed at workflows, and a sub-set of these focus on running ensembles of calculations on clusters and supercomputers. A dynamic ensemble refers to packages that automatically steer the ensemble based on intermediate results. This may involve deciding simulation parameters based on numerical optimization or machine learning techniques, among other possibilities. Other packages in this space include Colmena [@colmena21] and the RADICAL-Ensemble Toolkit [@ensembletoolkit16], and also packages that provide back-end dispatch and execution such as Parsl [@parsl] and Balsam [@Salim2019].

Some crucial considerations relevant to these packages include:

Expand All @@ -71,13 +71,15 @@ Some crucial considerations relevant to these packages include:

LibEnsemble stands out primarily through its generator-simulator paradigm, which eliminates the need for users to explicitly define task dependencies. Instead, it emphasizes data dependencies between customizable Python user functions. This modular design also lends itself to exploiting the large library of example user functions that are provided with libEnsemble, maximizing code re-use. For instance, users can readily choose an existing generator function and tailor a simulator function to their particular needs.

libEnsemble is a complete toolkit that includes generator in-the-loop and backend mechanisms. Some other packages cover parts of these. For example, Colmena has a front-end that uses 'thinker' and 'doer' [check] functions while using Parsl to dispatch simulations.

libensemble takes the philosohpy of minimising required dependencies, while supporting various back-end mechanisms when needed. For example, the vast majority of users do not require to be running a database application or special run-time to use libEnsemble, but for those that do, Balsam can be used on the back-end by substituting the reguler MPI executor for the Balsam executor. This approach simplifies the user experience and reduces the initial setup and adoption costs when using libEnsemble.

Check warning on line 76 in docs/papers/joss/paper.md

View workflow job for this annotation

GitHub Actions / Spellcheck release branch

"philosohpy" should be "philosophy".

Check warning on line 76 in docs/papers/joss/paper.md

View workflow job for this annotation

GitHub Actions / Spellcheck release branch

"reguler" should be "regular".

To acheive portability, libEnsemble employs system detection beyond other packages. It detects crucial system information such as scheduler details, MPI runners, core counts, GPU counts (for different types of GPU), and uses these to produce run-lines and GPU settings for these sytems, without the user having to alter scripts. For example, on a system using "srun", libEnsemble will use srun options to assign GPUs, while on other systems it may assign via environment variables such as ROCR_VISIBLE_DEVICES or CUDA_VISIBLE_DEVICES, while the user only states the number of GPUs needed for each simulation. For cases where autodetection is insufficient the user can supply platform information or the name of a known system via scripts or an environment variable.

Check warning on line 78 in docs/papers/joss/paper.md

View workflow job for this annotation

GitHub Actions / Spellcheck release branch

"acheive" should be "achieve".

Check warning on line 78 in docs/papers/joss/paper.md

View workflow job for this annotation

GitHub Actions / Spellcheck release branch

"sytems" should be "systems".

By default, libEnsemble divides available compute resources amongst workers. However, when simulation parameters are created, the number of processes and GPUs can also be specified for each simulation. Combined with the portability features, this makes it very simple to transfer user scripts between platforms.

libensemble takes the philosohpy of minimising required dependencies, while supporting various back-end mechanisms when needed. For example, the vast majority of users do not require to be running a database application or special run-time to use libEnsemble, but for those that do, Balsam can be used on the back-end by substituting the reguler MPI executor for the Balsam executor. This approach simplifies the user experience and reduces the initial setup and adoption costs when using libEnsemble.

The close coupling between libEsnemble generator and simulators enable the generator to both asychronously be taking in results and updating models, and to cancel previously issued simulations. Running simulations can be terminated and resources recovered. This is more flexible compared to other packages, where the generation of simulations is external to the dispatch of a batch of simulations.
The close coupling between the libEnsemble generator and simulators enables the generator to perform tasks such as asynchronously receiving results, updating models, and cancelling previously initiated simulations. Simulations that are already running can be terminated and resources recovered. This is more flexible compared to other packages, where the generation of simulations is external to the dispatch of a batch of simulations.

libEnsemble supports persistent user functions.that run on workers, maintaining their memory, which prevents the storing and reloading of data required by packages that only support a fire-and-forget approach to ensemble components.

Expand Down

0 comments on commit ff46495

Please sign in to comment.