Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition to spack workflow #275

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

SeanBryan51
Copy link
Collaborator

@SeanBryan51 SeanBryan51 commented Mar 28, 2024

Now that CABLE is installable with spack (see ACCESS-NRI/spack-packages#60), we can leverage features of spack in benchcab's workflow to build models in a way that:

  1. Is model agnostic
  2. Improves portability of benchcab to other machines
  3. Makes the source code easier to maintain - using a package manager like spack to install different versions of CABLE is more modular than doing this manually. For example, moving to spack will simplify code and remove complex abstractions - resolving issues like:
  4. Allows for testing multiple model build configurations (e.g. different compilers, different compiler flags, different versions of dependencies, etc).

This pull request contains all required changes for transitioning to a spack based workflow.

To be updated as the PR progresses.

Requirements

  1. As a user (of benchcab), I want to leverage functionality provided by spack so that I can test different build configurations for my model.
    1. As a user, I want to specify the spack spec being used when compiling the model so that I can see the versions of dependencies, variants (compile options), etc.
    2. As a user, I want to compile the model with multiple spack specs of my choosing so that different build configurations can be tested. E.g. different compilers, different variants (in particular release and debug build types), different MPI implementations, different versions of dependencies, etc.
    3. As a user, I want a test environment so that my builds are reproducible.
  2. As a developer, I want the minimal changes to the current core functionality (aside from deprecated features) so that previously satsfied requirements are not undone.

Edge cases

  • The user invoking benchcab may not have spack installed.
  • The user invoking benchcab may have an existing spack environment activated.
  • The user invoking benchcab has not installed any dependencies of CABLE and will need install the entire dependency tree which may take a while.

Checklist

@SeanBryan51 SeanBryan51 linked an issue Mar 28, 2024 that may be closed by this pull request
@SeanBryan51 SeanBryan51 added the priority:high High priority issues that should be included in the next release. label Mar 28, 2024
Copy link

codecov bot commented Mar 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.07%. Comparing base (59910ed) to head (45e4243).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #275      +/-   ##
==========================================
- Coverage   72.81%   71.07%   -1.74%     
==========================================
  Files          18       18              
  Lines         982      968      -14     
==========================================
- Hits          715      688      -27     
- Misses        267      280      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SeanBryan51
Copy link
Collaborator Author

SeanBryan51 commented Mar 28, 2024

Design (v0)

Use spack environments (spack.yaml files) to install models.

  • The spack.yaml configuration file contains the definition of spack specs used to build models. The file can be commited to git and allows for reproducible builds inside a test environment.
  • Multiple spack specs can be installed efficiently with spec matrices.

Commit a spack.yaml file to the benchcab run directory repository (bench_example). This effectively makes the directory an anonymous environment.

The new workflow consists of:

  1. [User] Specify spack packages to be installed inside spack.yaml.

  2. [Benchcab] Invoke relevant spack commands to install packages and infer installed packages in the current environment. For example:

# Note: these commands are invoked by benchcab - not the user
# 1. Activate the spack environment:
spack env activate .

# 2. Install packages defined in spack.yaml:
spack concretize --force
spack install

# 3. Read machine readable output from spack to infer installed packages of CABLE (for example)
# in the current environment:
spack find --json cable

# 4. Deactivate the spack environment:
spack env deactivate
  1. [Benchcab] Load each installed model separately and setup experiments.

  2. [Benchcab] Run experiments.

Deprecated functionality:

  1. realisations key:

    Benchcab now infers models installed in the current spack environment via spack find. This is done so that we do not have to specify models in both spack.yaml and the benchcab config file. As a result of this, we will need to:

    1. Make explicit to the user which package corresponds to R0, R1, etc as this was previously defined in the realisations key. We could output the package hash used for each realisation.
    2. Provide an alternative way of specifying namelist patches via the patch and patch_remove keys as these keys were specified under the realisations key.
    3. Provide an alternative way of testing a local checkout of a model as the local key will be removed.
  2. benchcab checkout and benchcab build functionality:

    This will be replaced with the functionality provided by spack. Activation of the spack environment and installing packages can be part of the benchcab *setup* commands.

  3. modules key and environment modules functionality:

    Spack provides all the dependencies required for compilation, thus loading modules for the build or for running the model will not be necessary.

Changes to existing core functionality:

  1. Namelist file patching (see patch and patch_remove):

    Deprecating the realisations key means that we can no longer specify a namelist patch for each model version. An approach could be to introduce an optional key model_specs which specifies one or more spack specs used to infer models to test in benchcab. Each entry in model_specs can also contain a namelist patch which gets applied to all models which match on the given spec. By default, model_specs contains an element with spec set to cable which will match on all installed cable versions.

    The schema would roughly look like the following:

    model_specs:
      type: "list"
      required: false
      schema:
        type: "dict"
        schema:
          spec:
            type: "string"
            required: true
          patch:
            ...
          patch_remove:
            ...

    Note:

    • a single spack spec can match on one or more model realisations.
    • applying patches to models should be made explicit to the user.
  2. Testing a local checkout of a model (see Use cable local checkout #255, and local key):

    Deprecating the realisations key also means the local key cannot be used to test an existing checkout of a model repository on the file system.

    It looks like there is existing functionality for doing this using spack with spack develop (see: https://spack-tutorial.readthedocs.io/en/latest/tutorial_developer_workflows.html#development-iteration-cycles). I will look more into this.

    Edit: I have looked more closely into spack develop, a limitation I ran into was that I was unable to install the environment when cable in 'develop' mode coexisted with non-develop versions of cable in the environment (see this related discussion and this pull request). This would mean benchcab could only test a local checkout if this was the only version of CABLE within the environment which would be a downgrade in functionality. A possible work-around could be to invoke spack dev-build in the directory of the local checkout (this will not work as the dev build will not visible in the current spack environment) or specifying a path to a pre-built executable but this definitely breaks the elegance of using spack environments.

  3. benchcab clean functionality

    We may want to uninstall installed packages, however this may not be necessary given spack environments should be fully reproducible. Requires further investigating.

Strange spack errors:

  1. Spack cannot find recent git references due to cache

    To reproduce:

    # 1. Starting from a fresh spack install
    spack spec cable ...
    # 2. Push a new branch to CABLE
    git push -u origin my-branch
    # 3. Run spack spec on the new branch
    spack spec [email protected]
    

    Step 3 results in the following error:

    ==> Error: my-branch is not a valid git ref for cable.

    Workaround:

    Spack caches cloned git repositories in the ~/.spack/git_repos directory, removing the directory will force spack to reclone the git repository and will make new git references from upstream discoverable:

    # remove cached git respositories
    rm -rf ~/.spack/git_repos
    spack spec cable@git.<branch_name> ...

    Is there a less hacky way of doing this?

@ccarouge
Copy link
Collaborator

ccarouge commented Apr 2, 2024

Could you give a dummy example of what a spack.yaml would look like for benchcab? So something that looks realistic for CABLE even if it doesn't actually work. I'm trying to see how easy it is for people to understand. With this setup, would there be an easy way to say "use same specs as for the last release"? It could be the default in bench_example obviously.

Make explicit to the user which package corresponds to R0, R1, etc as this was previously defined in the realisations key. We could output the package hash used for each realisation.

Hashs never seem to be very human-friendly to me. I wonder if there would be a different solution here. I can't think of what it would be, maybe the spack.yaml example would help there. Because I don't see how it replaces the realisations key.

I think I'll comment further after I get the example as things will be clearer then.

@SeanBryan51
Copy link
Collaborator Author

SeanBryan51 commented Apr 2, 2024

@ccarouge here is a spack.yaml that could be used with benchcab:

# This is a Spack Environment file.
#
# It describes a set of packages to be installed, along with
# configuration settings.
spack:
  # add package specs to the `specs` list
  specs:
  - matrix:
    - [[email protected], [email protected]]
    - ['%[email protected] ^[email protected] ^[email protected]']
  view: false
  concretizer:
    unify: false

or:

# This is a Spack Environment file.
#
# It describes a set of packages to be installed, along with
# configuration settings.
spack:
  # add package specs to the `specs` list
  specs:
  - [email protected]
  packages:
    cable:
      require: '%[email protected] ^[email protected] ^[email protected]'
  view: false
  concretizer:
    unify: false
  develop: {}

Installing the packages in this environment will compile the main and some-dev-branch branches with the same dependencies.

With this setup, would there be an easy way to say "use same specs as for the last release"?

By this do you mean building CABLE with spack using the same dependencies as how CABLE was built without spack? This should be possible provided all the module dependencies available on NCI are installable via spack. The release team has also setup spack "wrappers" around existing installs on NCI which can be used (for example the nci-openmpi package above is the existing NCI installation of openmpi and is not built from source).

Hashs never seem to be very human-friendly to me. I wonder if there would be a different solution here.

We could use other metadata which is more human readable in addition to a hash, for example name and version, but I would prefer to keep the hash so we can disambiguate packages that have been installed multiple times with slightly different build options. I've attached example output from running spack find --json which includes metadata we could use:

[
  {
    "name": "cable",
    "version": "git.tmp-branch-for-testing-5=0-git.493",
    "arch": {
      "platform": "linux",
      "platform_os": "rocky8",
      "target": "x86_64"
    },
    "compiler": {
      "name": "intel",
      "version": "19.0.5.281"
    },
    "namespace": "access.nri",
    "parameters": {
      "build_system": "cmake",
      "build_type": "Release",
      ... removed for demo purposes ...
    },
    "package_hash": "6xwaypybkzbwnwce6waylzppbzb2zxo3kmr5uqc5f23nvyaslkna====",
    "dependencies": [
      ... removed for demo purposes ...
    ],
    "hash": "z52w6i2uaoocckhpqraffduaf67cgswf"
  }
]

@ccarouge
Copy link
Collaborator

ccarouge commented Apr 2, 2024

Thanks @SeanBryan51 that's useful. I'm guessing in the matrix we can refer to specific commits or tags, right?

For the release that's not what I meant. When we will do a release of CABLE, we will have a deployed version of CABLE built with spack and centrally installed at NCI. Is there a way to design benchcab so that users can say they want that release and their branch built with the same dependency versions (same netcdf, mpi and compiler) as for that release? Is the simplest to do that to ensure these versions are what is in spack.yaml by default in bench_example?

By default, model_specs contains an element with spec set to cable which will match on all installed cable versions.

Why cable as default instead of None? Is that to ensure it is set and not forgotten?

applying patches to models should be made explicit to the user.

What do you mean? More exactly is that a change to the current behaviour of benchcab or is it keeping the behaviour?

Lastly, I would expect we need some changes to the cleanup process. We need to remove all the spack installs. Easy?

I'm worried about losing the local functionality. We'll need a solution. I'm happy if the solution is for the user to pre-build and benchcab to somehow know how to find these executables.

@SeanBryan51
Copy link
Collaborator Author

I'm guessing in the matrix we can refer to specific commits or tags, right?

Spack supports specifying git references for git projects.

For the release that's not what I meant. When we will do a release of CABLE, we will have a deployed version of CABLE built with spack and centrally installed at NCI. Is there a way to design benchcab so that users can say they want that release and their branch built with the same dependency versions (same netcdf, mpi and compiler) as for that release? Is the simplest to do that to ensure these versions are what is in spack.yaml by default in bench_example?

I don't have the best answer to this as I'm not sure how the CABLE release would look like. One way could be to define the preferred dependencies for the CABLE release inside a package.yaml file, and then include these configurations in the spack.yaml file (see included configurations).

By default, model_specs contains an element with spec set to cable which will match on all installed cable versions.

Why cable as default instead of None? Is that to ensure it is set and not forgotten?

The default for model_specs could be left as None but it seemed natural for the default to contain cable as benchcab will invoke spack find --json cable to infer all CABLE models installed in the spack environment.

applying patches to models should be made explicit to the user.

What do you mean? More exactly is that a change to the current behaviour of benchcab or is it keeping the behaviour?

This is just so its clear to the end user which model versions are getting patched, as multiple model versions could match for a given model spec. I realised we already do this when setting up experiments so it isn't really a change to the current behaviour.

Lastly, I would expect we need some changes to the cleanup process. We need to remove all the spack installs. Easy?

I haven't thought about how clean up will look but there will likely be some changes. I will add this to "Changes to existing core functionality".

I'm worried about losing the local functionality. We'll need a solution. I'm happy if the solution is for the user to pre-build and benchcab to somehow know how to find these executables.

We don't want to lose any of the existing functionality thats for sure. I plan on looking into spack develop as this lets you build your local repository as if it were a spack package. Will report more as I look into it.

@SeanBryan51
Copy link
Collaborator Author

SeanBryan51 commented Apr 8, 2024

I have looked more closely into spack develop, a limitation I ran into was that I was unable to install the environment when cable in 'develop' mode coexisted with non-develop versions of cable in the environment (see this related discussion and this pull request). This would mean benchcab could only test a local checkout if this was the only version of CABLE within the environment which would be a downgrade in functionality. A possible work-around could be to invoke spack dev-build in the directory of the local checkout (this will not work as the dev build will not visible in the current spack environment) or specifying a path to a pre-built executable but this definitely breaks the elegance of using spack environments.

@SeanBryan51 SeanBryan51 added priority:medium Medium priority issues to become high priority issues after a release. and removed priority:high High priority issues that should be included in the next release. labels Apr 11, 2024
@SeanBryan51 SeanBryan51 linked an issue Apr 24, 2024 that may be closed by this pull request
@SeanBryan51 SeanBryan51 deleted the 258-spack-transition branch April 24, 2024 01:53
@SeanBryan51 SeanBryan51 restored the 258-spack-transition branch April 24, 2024 01:56
@SeanBryan51 SeanBryan51 reopened this Apr 24, 2024
@SeanBryan51
Copy link
Collaborator Author

SeanBryan51 commented May 6, 2024

Design (v0.1)

Same workflow as initial v0 design except no existing features are deprecated (i.e. fully backwards compatible). Support with spack can be switched on/off at runtime.

Additional functionality:

  1. Namelist file patching (see patch and patch_remove) for models installed with spack:

    Introduce an optional key model_specs (as described in initial v0 design)

Changes to existing core functionality:

  1. Make the realisations key optional in config file.

    • This is so we can omit the realisations key when the spack workflow is enabled.
    • Need to error in benchcab if no models are found.

Tasks

  1. Add metadata attribute to Model class: Add metadata attribute to Model class #288

  2. Refactor Model and Benchcab classes:

    • The addition of spack means that the Model class no longer requires a Repo instance as well as other attributes such as name, install_dir, and build_script. We should simplify the Model class so that only key attributes are included - these should include: the model ID, model metadata, namelist file patches, and the absolute path to the installed binaries. This way we can reuse the Model data structure to run models installed with spack.

      An approach we could take could be to initialise Model objects from Repo objects and decouple the two classes.

      • Remove the name attribute from Model as this is already provided by the get_branch_name() method of the Repo interface.
      • Move Model.build* methods to build* functions which take in a Repo object as an argument.
      • Redefine the Model.build_script attribute as an argument to build* functions as it is only used in the Model.custom_build() method.
      • Remove install_dir attribute from Model as this will be replaced with an absolute path to the installed binaries.

@SeanBryan51
Copy link
Collaborator Author

Currently, every benchcab invocation requires executing spack concretize && spack install when initialising Model objects. It would be great if benchcab can cache the packages used for each Model instance and re-initialise only when needed (say when spack.yaml is updated). This is related to #20 as we currently do not have a way for state to persist across several benchcab calls.

@abhaasgoyal
Copy link

abhaasgoyal commented Jul 4, 2024

@SeanBryan51 I was going through the PR, I read a potential issue of losing the local functionality since develop keyword doesn't work with both local and remote builds. To keep all previous features, one solution that you mentioned was making the realisations key optional in config file. Was wondering regarding alternative solutions to this (I haven't gone into the weeds of how spack works, so I may be wrong) but could be a starting point. This seemed the least clunky to me - what about having a separate environment file (spack_local.yaml) which installs in the same src dir but has the develop keyword. We use this file if it's provided and/or config.yaml provides the appropriate keyword. This would ensure that we remove the realisations keyword which would leave most of the complexity to spack itself. (I know we are being a bit dependent around spack internals and there may be repeats on the spack*.yaml files but they seem minimal).

Another aspect which I thought to this was (if the above was viable) - realisations acting as the frontend to create the necessary environments, and using the spack Python API to call both develop and install versions. We wouldn't need to define spack files anymore, but the transformation requires knowing the spack internals and more dev effort.
Link: https://spack.readthedocs.io/en/latest/spack.html

An interesting link (not directly our use case but could be useful for dev workflows) I came about: https://key4hep.github.io/key4hep-doc/developing-key4hep-software/SpackForDevelopers.html
(I don't know the internals enough to understand every command here tho)

@SeanBryan51
Copy link
Collaborator Author

@abhaasgoyal I'll give a crack at adding support for more than one spack environment file (e.g. having a spack-local.yaml). From the CABLE release meeting yesterday, it looks like releases of CABLE will also be defined with spack environments so this functionality will also be useful if we want to test against released versions of CABLE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:medium Medium priority issues to become high priority issues after a release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spack based workflow Update build in benchcab once the updated CABLE build is released
3 participants