Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Derecho to supported platforms #1836

Merged
merged 57 commits into from
Dec 14, 2023

Conversation

mark-a-potts
Copy link
Contributor

@mark-a-potts mark-a-potts commented Jul 13, 2023

Description

Adds module file(s) to support building and running regression tests on the new Derecho system at UCAR.

Commit Message

  • UFS:
    • Remove Cheyenne (retiring UCAR HPC) support.
    • Add UCAR's new Derecho HPC system to supported Tier 1 Platforms.

Input data additions/changes

  • No changes are expected to input data.
  • Changes are expected to input data:
    • New input data.
    • Updated input data.

Anticipated changes to regression tests:

  • No changes are expected to any regression test.
  • Changes are expected to the following tests:

Subcomponents involved:

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Library Updates/Changes

Combined with PR's (If Applicable):

Commit Queue Checklist:

  • Link PR's from all sub-components involved in section below
  • Confirm reviews completed in ALL sub-component PR's
  • Add all appropriate labels to this PR.
  • Run full RT suite on either Hera/Cheyenne AND attach log to a PR comment.
  • Add list of any failed regression tests to "Anticipated changes to regression tests" section.

Linked PR's and Issues:

Testing Day Checklist:

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR.
  • Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems.

Testing Log (for CM's):

  • RDHPCS
    • Hera
    • Orion
    • Jet
    • Gaea
    • Cheyenne
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

@kbooker79 kbooker79 added noaacloud-aws-BL Create new baselines and run regression tests in NOAAcloud (AWS) and removed noaacloud-aws-BL Create new baselines and run regression tests in NOAAcloud (AWS) labels Jul 21, 2023
@zach1221
Copy link
Collaborator

@jkbk2004 @BrianCurtis-NOAA I was able to create baselines successfully on Derecho, for the intel portion of rt.conf, when testing against this PR. Just fyi.

@zach1221
Copy link
Collaborator

@mark-a-potts one thing to note. I do get an error pointing to line 449 in rt.sh. I have to hash out these rocoto variables just to get it to run, even when using ecflow.

@DeniseWorthen
Copy link
Collaborator

I am able to compile in this branch using the same ACCNR=NRAL0032 I had used previously. I had to make the same change @zach1221 did for the rocoto variables. The baseline directory I think will need to be created separate from the one that is pointed to now (/glade/cheyenne/scratch/epicufsrt/).

Since none of us can run on Cheyenne anymore anyway and it is closing down, I would recommend removing the cheyenne-specific TPN settings in the tests and perhaps just removing all the cheyenne related items in the scripts.

if [[ $MACHINE_ID = cheyenne ]]; then
  TPN=18
fi

@zach1221
Copy link
Collaborator

@mark-a-potts are there any plans to have a Derecho gnu module file as well, similar to how Cheyenne used both intel/gnu? Also, I can start a PR that removes the Cheyenne machine name from the RT related scripts. We can merge it with this PR once finished perhaps.

@zach1221
Copy link
Collaborator

Hi, @mark-a-potts @ulmononian wanted to follow up on this PR again, and let you know that Intel seems run fine on Derecho. I was able to create new Intel baselines against most of the regression tests compiled with Intel in rt.conf. I think it's a good idea to get the ufs-wm RTs running soon with gnu as well. Is there already a gnu spack-stack installation ready on Derecho? @jkbk2004

@mark-a-potts
Copy link
Contributor Author

I do not believe that there is a gnu stack built for derecho @zach1221, and I am not sure if/when that might get done. The only gcc module I see there right now is for 12.2.0, and I don't think that spack-stack has been upgraded to that level for gcc, yet.

@zach1221
Copy link
Collaborator

I do not believe that there is a gnu stack built for derecho @zach1221, and I am not sure if/when that might get done. The only gcc module I see there right now is for 12.2.0, and I don't think that spack-stack has been upgraded to that level for gcc, yet.

Understood. I don't think that's an issue as Hercules and Hera will still have gnu support. Given this, is the PR ready to move forward or is there additional work that needs to be done?

@zach1221
Copy link
Collaborator

@DomHeinzeller @mark-a-potts So, are we in the process of building spack-stack v1.5.0 on Derecho or is there an existing v1.5.0 installation that needs to be updated now following changes during the maintenance last week?
@jkbk2004

@climbfuji
Copy link
Collaborator

@DomHeinzeller @mark-a-potts So, are we in the process of building spack-stack v1.5.0 on Derecho or is there an existing v1.5.0 installation that needs to be updated now following changes during the maintenance last week? @jkbk2004

1.5.0 should still work (and we won't make any changes to it). FWIW, I am going to finish the 1.5.1 build on Derecho today or at latest tomorrow.

@zach1221
Copy link
Collaborator

@DomHeinzeller Ok, I ask because I'm running some RTs against the spack installation currently listed in the ufs_derecho.intel.lua modulefile and receive this missing ncarcompilers error when the tests fail to compile. So, something may have changed since the maintenance last week.
image

@climbfuji
Copy link
Collaborator

climbfuji commented Oct 31, 2023 via email

@zach1221
Copy link
Collaborator

zach1221 commented Nov 1, 2023

See https://spack-stack.readthedocs.io/en/latest/PreConfiguredSites.html#ncar-wyoming-derecho seehttps://spack-stack.readthedocs.io/en/latest/PreConfiguredSites.html#ncar-wyoming-derecho pleae

On Oct 31, 2023, at 12:57 PM, zach1221 @.***> wrote: @DomHeinzeller https://github.com/DomHeinzeller Ok, I ask because I'm running some RTs against the spack installation currently listed in the ufs_derecho.intel.lua modulefile and receive this missing ncarcompilers error when the tests fail to compile. So, something may have changed since the maintenance last week. https://user-images.githubusercontent.com/99902696/279485362-8aafff34-bc46-4bc0-b92c-5e3f9fe8133e.png — Reply to this email directly, view it on GitHub <#1836 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RPO266JM6O3WMVXH5TYCFCYHAVCNFSM6AAAAAA2JCW7FGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBXHAZDIMJXHE. You are receiving this because you commented.

Thanks, @DomHeinzeller . I may be doing something wrong, but still receiving the same error with the NCAR-Wyoming Derecho setup in ufs_derecho.intel.lua. Perhaps it's best to wait for the 1.5.1 installation, in this case.

@zach1221
Copy link
Collaborator

zach1221 commented Nov 6, 2023

@DomHeinzeller @ulmononian I think I found the v1.5.1 spack-stack installation at (/glade/work/epicufsrt/contrib/spack-stack/derecho/spack-stack-1.5.1/envs/unified-env/install/modulefiles/Core) on Derecho. I assume it's ready for use. Do you know what other changes should be made to the ufs_derecho.intel.lua file, other than adding the above module path?

jkbk2004
jkbk2004 previously approved these changes Dec 14, 2023
@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Dec 14, 2023

I'm testing something on Derecho now by pulling this branch into my feature branch. When running rt.sh, the run is created in /glade/derecho/scratch/$USER/$USER/FV3_RT/rt_XXXXX, ie

/glade/derecho/scratch/worthen/worthen/FV3_RT/rt_42882

jkbk2004
jkbk2004 previously approved these changes Dec 14, 2023
jkbk2004
jkbk2004 previously approved these changes Dec 14, 2023
@zach1221 zach1221 merged commit 10635ef into ufs-community:develop Dec 14, 2023
@DusanJovic-NOAA
Copy link
Collaborator

I do not see log files from regression and Jenkins-ci tests on all supported platforms for this PR.

@jkbk2004
Copy link
Collaborator

We confirmed no impact of this pr on other machines.

@DusanJovic-NOAA
Copy link
Collaborator

How?

@jkbk2004
Copy link
Collaborator

code changes only related with derecho and we confirmed with sanity check on a few machines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants