Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERS Test with CISM No-Evolve runs for 3 years at restart instead of 1 #83

Closed
Katetc opened this issue Dec 11, 2023 · 3 comments
Closed
Assignees

Comments

@Katetc
Copy link
Contributor

Katetc commented Dec 11, 2023

In the testing for the cismwrap_2_1_97 tag, the test:
ERS_D_Ly3.f09_g17_gris4.T1850Gg.derecho_intel.cism-noevolve

Fails with a failure on the base-restart comparison. This happens because instead of restarting at year 3 and running for 1 year, the test restarts at year 3, and runs for 3 more years. The test then attempts to compare a year 0004 history file when the final output is a 0006 history file and fails.

I looked into this for a while, and I have no idea why the test runs for 3 years at the restart. The STOP_N is set to 1 year. I've never seen CESM ignore this before. Other ERS tests all pass (though, notably, all other ERS tests have active CISM). Is there something about a TG compset that, when running with NoEvolve, ignores changes to STOP_N? So strange. I've gone ahead with making the 2_1_97 tag as this seems to be a test issue and not a CISM issue, but I'll make this issue to document it.

@Katetc Katetc self-assigned this Dec 11, 2023
@billsacks
Copy link
Member

I'm looking into this... I have some ideas. I want to do a little more testing, then will share my thoughts / findings.

@billsacks
Copy link
Member

@Katetc and I discussed this a couple of weeks ago and I was supposed to post our findings and thoughts here... but then got pulled away to other things and am just getting back to this today. So, Kate, here's my best recollection of what we discussed... nothing new, but just putting our discussion into writing:

On the surface, the reason why this test is newly failing is that it had been disabled until the most recent CISM-wrapper tag. (A few years ago, I think there were issues with creating the nuopc configuration files for this test - see #60 (comment). These issues have been fixed, so Kate tried enabling this test, but then ran into the issue documented here.)

Going one level deeper: The problem here is that, with NUOPC/CMEPS, the expectation has been that, in noevolve mode, CISM won't ever be called in the run phase. This had been implemented for most compsets, but not for T compsets. I just opened a CMEPS PR that fixes this for T compsets: ESCOMP/CMEPS#425. Kate, as noted in that PR, it would be great if you can confirm that this both fixes things for you (which I have already tested, but it wouldn't hurt to get a second test of it) and that other T compset tests still run and are bit-for-bit (which I have not tested... I'd be surprised if anything broke based on a read of the code, but it would be good to confirm that).

I'm not positive, but I think the reason this caused problems is: In noevolve mode, CISM is set up to not read a restart file by being told that this is an initial run rather than a restart run. (This is done because it doesn't expect to have a restart file to read, since it never executes the run phase.) That's fine if CISM's run phase is never called, but before the above CMEPS fix, CISM's run phase was being called for T compsets in noevolve mode. I think what happened in this case was: The system started up at the beginning of year 3, but CISM started back at the beginning of year 1 (because its time comes from its namelist file, which is pinned to the start of the initial run, not the restart run; it expects to get the restart time from its restart file, but in this case it wasn't reading a restart file). So when CISM first executed the run phase, it hit the loop saying "run until your time matches the run-to time from the driver". CISM saw its time as the start of year 1 and the run-to time as the start of year 4, so it ran 3 additional years after restart instead of the 1 that it should have run.

The path forward for testing that we discussed is:

  • Change the current I compset SMS_Ly13 test to instead be an ERS_Ly3 test. On derecho this turns around in 17 minutes, so it isn't too long – and an I compset test is a more realistic way to test noevolve than a T compset.
  • But Kate would also like to keep this T compset noevolve test; this will be possible once the above CMEPS change is brought in.

Whew! That was a long explanation for a one-line fix!

@Katetc
Copy link
Contributor Author

Katetc commented May 8, 2024

The issue was fixed in CMEPS PR#425 and brought into cism-wrapper with the cism_wrap_2_1_100 tag.

@Katetc Katetc closed this as completed May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants