Process level parallelization of esm_master #46

JanStreffing · 2021-04-15T13:53:37Z

I was talking with Jan Hegewald, Miguel, and Thomas Jung the other day. One of the points that came up during discussion is that we want the esm_tools to be attractive, not just for scientists and maintainers, but also for model devlopers. While waiting on awi-cm3 to compile of aleph, I thought of one feature that might have such an effect.

At moment we already have 5 components in awi-cm3:

oasis
xios
openifs
fesom
rnfmap

Since we are planning to automate more steps from what we would generally call awi-cm3 workflow through esm_tools, I can see at least 2 more that will be added sooner or later:

perl
eccodes

That's seven components, all of which we currently compile one after the other. On a machine with fast login nodes like ollie this will take ~15-20 minutes. On a machine with slow login lodes like aleph it can be more like 45-60 minutes when done from scratch.

Based on dependencies we could compile quite a number of these components in parallel. This is aided by the fact that parallel compiling usually only scales to a handful of processes, leaving enough cores on a login node to run multiple parallel compilings at a time. The idea would be to define (e.g. in the couplings yaml file) which dependencies need to be fulfilled before the compiling of a component can be kicked off.

Example (no attempt at being grammatically correct)

components:
- eccodes-2.21.0
- perl-5.32.1
- xios-2.5
- rnfmap-awicm3
- oifs-43r3-awi-frontiers
- fesom-2.0-frontiers
- oasis3mct-4.0-awicm3-frontiers
dependencies:
  xios-2.5: oasis3mct-4.0-awicm3-frontiers perl-5.32.1
  oifs-43r3-awi-frontiers: oasis3mct-4.0-awicm3-frontiers perl-5.32.1 xios-2.5
  rnfmap-awicm3: oasis3mct-4.0-awicm3-frontiers
  fesom-2.0-frontiers: oasis3mct-4.0-awicm3-frontiers
coupling_changes:
- sed -i '/FESOM_COUPLED/s/OFF/ON/g' fesom-2.0/CMakeLists.txt
- sed -i '/OIFS_COUPLED/s/OFF/ON/g' fesom-2.0/CMakeLists.txt
- sed -i '/COUPLENEMOECE = /s/.TRUE./.FALSE./g' oifs-43r3/src/ifs/module/yommcc.F90
- sed -i '/COUPLEFESOM2 = /s/.FALSE./.TRUE./g' oifs-43r3/src/ifs/module/yommcc.F90
- sed -i '/COUPLENEMOFOCI = /s/.TRUE./.FALSE./g' oifs-43r3/src/ifs/module/yommcc.F90

In this case perl, eccodes and oasis can all start right away. As soon as perl is done xios can start as well. As soon as oasis finishes fesom and rnfmap can kick off. When xios is done openifs can start. XIOS and OpenIFS will still take a while, but we might be able to cut the whole compile time in half.

How much effort would it be to implement something like this on the backend?

Inviting feedback @mandresm @hegish

The text was updated successfully, but these errors were encountered:

JanStreffing added the enhancement New feature or request label Apr 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process level parallelization of esm_master #46

Process level parallelization of esm_master #46

JanStreffing commented Apr 15, 2021 •

edited

Loading

Process level parallelization of esm_master #46

Process level parallelization of esm_master #46

Comments

JanStreffing commented Apr 15, 2021 • edited Loading

JanStreffing commented Apr 15, 2021 •

edited

Loading