Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable modules for packages not compiled with preferred compiler (update spack-stack setup-meta-modules); remove need for external ecflow #1257

Merged
merged 31 commits into from
Aug 28, 2024

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Aug 20, 2024

Summary

Note. I pinky swear that I am going to rewrite that old setup-meta-modules extension that was clobbered together in an afternoon three years ago with the thinking "let's try something and then do it properly next month".

This PR makes the necessary changes to the site configs and the setup-meta-modules extension to build certain packages with other compilers than the preferred compiler and still be able to load the module for that package. We can only do this now that we have the concept of a preferred compiler in our environments. One assumption made here is that the MPI provider, if any, is compiled with the preferred compiler (I think this is a reasonable assumption to make).

Our use cases are:

  1. bison needs to be compiled with gcc when the preferred compiler is oneapi. Strictly speaking, we don't need the bison module in this case, but I tested it on my laptop and it works.
  2. ecflow and boost must be compiled with gcc when the preferred compiler is intel. This allows us to move away from external ecflow packages that don't work with the proposed update of Python to 3.11.7 (because the external ecflow was compiled with an old Python 3.9). In this case, we need the ecflow modulefile. I tested this on the Ubuntu CI runner and on Narwhal.

Caveat: I have not tested if this new capability works with packages that depend on MPI (which is compiled with the preferred compiler) but that get compiled with a different compiler (e.g. something like intel-oneapi-mpi/2021.12.0/gcc/11.2.0 when the packages using the preferred compiler would have intel-oneapi-mpi/2021.12.0/intel/2021.12.0).

Still todo:

  • Configure gcc compiler used as backend for Intel for Atlantis, Gaea C5, Gaea C6
    • Atlantis deferred to a follow-up PR that also configures the oneAPI compiler
    • Gaea C5 and Gaea C6 - @AlexanderRichert-NOAA @RatkoVasic-NOAA any last-minute changes for the gcc backend for Gaea C5 and C6 for this PR, or do you want to do this as a follow-up PR/when we roll out the release?
  • Remove external ecflow from packages.yaml and make sure every site has an external qt@5 in its packages.yaml
  • Fix unit test, or disable because we are going to rewrite the setup-meta-modules extension after the 1.8.0 release. Yes, this time for sure! done

Testing

  • Tested for oneapi / bison on @climbfuji's laptop (unified environment)
  • Tested for intel / ecflow on Ubuntu CI runner and on Narwhal
  • More testing?

Applications affected

None (no changes to how applications are run)

Systems affected

All using Intel or oneAPI compilers

Dependencies

none

Issue(s) addressed

Link the issues addressed or resolved by this PR (use Fixes #??? for fully resolved issues)

Checklist

  • This PR addresses one issue/problem/enhancement, or has a very good reason for not doing so.
  • These changes have been tested on some of the affected systems and applications.
  • All dependency PRs/issues have been resolved and this PR can be merged.

@ashley314 ashley314 mentioned this pull request Aug 22, 2024
3 tasks
…n env using one principal (preferred) compiler
logging.info(" ... ... appending {} to MODULEPATHS_SAVE".format(modulepath_save))
MODULEPATHS_SAVE.append(modulepath_save)

# For tcl modules remove the compiler prefices from the module contents
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block (remove the compiler prefices for tcl) got moved up, since it is needed for all compilers - not just the preferred compiler. This allows us to skip the remainder of the loop for compilers that are not the preferred compiler.

@climbfuji climbfuji marked this pull request as ready for review August 23, 2024 18:09
Copy link
Collaborator

@RatkoVasic-NOAA RatkoVasic-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully installed intel on Hercules.
Approved.

@climbfuji climbfuji self-assigned this Aug 27, 2024
@ashley314
Copy link
Collaborator

I was able to install on S4, although ecflow might not have successfully compiled with gcc? I also cannot figure out how to load the spack build ecflow instead of the version at /data/prod/jedi/spack-stack/.

image

@climbfuji
Copy link
Collaborator Author

@ashley314 Your issue may be that you still have an external ecflow in the S4 site config. You will want to remove that and make sure you have an external qt@5 instead. And you will want to remove the ecflow module from the list of excluded modules in the site config. I think @srherbener can help you (I'll be talking about the PR in the spack-stack meeting today).

Copy link
Collaborator

@AlexanderRichert-NOAA AlexanderRichert-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work fine based on testing on my local machine (namely, loading stack-intel reveals gcc-built modules, e.g., boost).

@climbfuji
Copy link
Collaborator Author

This seems to work fine based on testing on my local machine (namely, loading stack-intel reveals gcc-built modules, e.g., boost).

Thanks for testing @AlexanderRichert-NOAA ! I'll wait for approval from JCSDA before I merge this.

@ashley314
Copy link
Collaborator

@climbfuji thanks for the suggestions. I worked with @srherbener, we removed ecflow as an external package and qt was already declared. The exclude for ecflow also needed to be removed from the modules file. We were able to then install ecflow and made it through ldmod refresh with everything looking good. Ran the meta module script, but then when trying to load the jedi environment ecflow is still not showing up. Do you have any ideas on what else is missing?

image

Although still seeing only boost:
image

@climbfuji
Copy link
Collaborator Author

@climbfuji thanks for the suggestions. I worked with @srherbener, we removed ecflow as an external package and qt was already declared. The exclude for ecflow also needed to be removed from the modules file. We were able to then install ecflow and made it through ldmod refresh with everything looking good. Ran the meta module script, but then when trying to load the jedi environment ecflow is still not showing up. Do you have any ideas on what else is missing?

image Although still seeing only boost: image

I'll ping you in slack

@@ -1,6 +1,6 @@
packages:
all:
compiler:: [[email protected]]
compiler:: [[email protected]] # todo: add gcc here
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to double check. Is the intention to wait until after the 1.8.0 release to make these changes for Gaea?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll do this as part of the site config updates on the release branch and then bring it back to develop if that makes sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense. Happy to approve!

@@ -1,6 +1,6 @@
packages:
all:
compiler:: [[email protected]]
compiler:: [[email protected]] # todo: add gcc here
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense. Happy to approve!

@climbfuji climbfuji merged commit bcd873d into JCSDA:develop Aug 28, 2024
8 checks passed
@climbfuji climbfuji deleted the feature/boost_gcc4intel branch August 28, 2024 15:25
@climbfuji climbfuji mentioned this pull request Aug 30, 2024
72 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

5 participants