-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Year 400 crashes in ESM1.5 simulations #466
Comments
An option that @aidanheerdegen brought up is to try cut the current complicated timing calculations, and replace them with a completely new setup that uses an "artificial" calendar file that goes in the restart directory, say The overall steps for setting cice's start date and run length could be:
This means that we would no longer need copies of the Does this sound like a reasonable overall approach? Implementation details/concerns: 1. Can we assert in Based on what I understand of the following: Lines 360 to 376 in 89d70cf
payu will let ESM try to run without a
Trying to run in these situations, the model crashes because of the missing restarts. Are there any niche ESM situations where either of the above two situations need to be supported/allowed? Or is it safe to assert in the 2. Currently two separate timing calculations occur in different parts of payu: one in the Would we prefer this calculation to occur in the I suspect we need to keep the second calculation in the 3/. In the standalone cice case, where timing calculations are completely controlled by the I think this would be nice to do and would help make everything simpler, however may involve some work in understanding the different situations that the current calculation and logic paths are set up for, as it would be bad to accidentally break any of them... 4. Consistency checks. I think the easiest way to check consistency between the different model start dates would be to compare each of their restart date text/yaml files ( Similar internal consistency checks might also be useful for the UM, as the actual restart dump 5. Caltype. If we swap to an artificial restart date text or yaml file, the ice model could still get the timing wrong if the 6. If we set ice model start date using a text file in the restart directory, it looks like there are some specific situations where payu will overwrite the In the Lines 132 to 146 in 89d70cf
If this happened, the timing calculation might be impacted. Since the current calculations also use data in the restart directory, I don't think this would be a new issue though. 7. This change will be incompatible with the |
A much simpler suggestion from @anton-seaice: Rather than reading the I'll have a go at implementing this and seeing how it goes. |
I wondered if it will work ok with the existing warm-start scripts too? I believe |
I've started putting this together in this feature branch (https://github.com/ACCESS-NRI/payu/tree/466-esm1p5-cice-startdate-fix) and will do pull requests to that in smaller steps. The first step has been to rename the
I've then modified payu to read This method seems to work and has a couple of benefits:
A rough version of this option is available here: An alternative is to modify the setup so that
And payu can just read the new start date from I think a couple of downsides to this approach are:
A rough version of this option is available here: https://github.com/ACCESS-NRI/payu/tree/466a-cice-start-noruntime0 It would be great to hear what everyone thinks of these approaches, and whether you have a preference between the two. In either case, I think the next steps would be to pull the code which reads/calculates the start date into a seperate method which could then be used to check whether the start dates are consistent between the submodels. |
This relates to discussion in #457.
Background:
Researchers have had problems with long ESM1.5 paleo simulationscrashing in calendar year 400. See here and here. In the examples, the coupler and sea ice model appear to think there are only 365 days in the year while the ocean and atmosphere use the correct 366 days, leading to the crash, and in the first example, the ice model thinks its at year 300.
The disagreement between the ice/namcouple and the other components doesn't occur in every simulation that reaches year 400 though. E.g. attempting to reproduce the error by branching from the CSIRO pre-industrial run at year 400, and setting it to run for 3 months, payu gives the
namcouple
file the correct leap year run length of 91 days.After working through some of Himadri's simulations, it looks like the calendar mismatch comes from payu's start date calculation for the cice submodel. It pulls in information from both the control directory and the restart directory, meaning that if you copy a restart directory across different experiments, you could end up with inconsistent start dates. I still find it a bit confusing, so I hope the following explanation makes some sense.
How payu sets the cice and namcouple run lengths/start dates:
The cice start date and run length are (mostly) calculated in the
access.py
driver. It uses the<CONTROL-DIRECTORY>/ice/input_ice.nml
namelist, which for example in our pre-industrial configuration looks likein addition to
<RESTART-DIRECTORY>/ice/input_ice.nml
, which for the pre-industrial configuration looks likeI've highlighted the variables that
access.py
uses in for the calculation with the*
symbols. The other variables are ignored.To set the start-date, it adds a total simulation length of
runtime0+runtime
seconds from<RESTART-DIRECTORY>/ice/input_ice.nml
to theinit_date
from<CONTROL-DIRECTORY>/ice/input_ice.nml
:payu/payu/models/access.py
Lines 138 to 142 in e9bd1f4
To calculate the run duration for the next experiment, it then uses this start date, the
caltype
value from<CONTROL-DIRECTORY>/ice/input_ice.nml
, and the runtime settings in theconfig.yaml
file:payu/payu/models/access.py
Lines 151 to 157 in e9bd1f4
The resulting runtime then gets used by cice and the coupler.
How problems can arise
Because the calculation uses the
init_date
from the control directory, copying a restart directory between different experiments can lead to different cice start dates (and hence run times) if the<CONTROL-DIRECTORY>/ice/input_ice.nml
files don't match. This can come up when using thewarm-start.sh
scripts to create a new restart directory based on a CSIRO simulation, which was done for some of the linked examples.The
warm-start.sh
scripts modify the cice start date by settinginit_date
in<CONTROL-DIRECTORY>/ice/input_ice.nml
to the desired start date (for example01010101
), andruntime0=0, runtime=0
in<RESTART-DIRECTORY>/ice/input_ice.nml
. Running the resulting configuration will then start the ice calendar at0101-01-01
.If you then start a new experiment by cloning e.g. the pre-industrial simulation, and copying over the restart directory already created by the
warm-start.sh
scripts, the new control directory will still have the unmodifiedinit_date=00010101
, while the restart folder will have the modifiedruntime0=0, runtime=0
, and cice will use a start date of0001-01-01
, 100 years off where it's meant to be.Meanwhile, the UM and MOM have their start dates given completely in the restart directory via the
um.res.yaml
andocean_solo.res
file, and so they'll use the correct date of0101-01-01
(there are some caveats for the UM in other situations). Once the simulation gets to year 400 the mismatch and crash can then occur.Possible changes
A couple of ideas listed below:
init_date
settings into the namelists in the restart directory and make corresponding changes to payu.It would be great to get any other ideas/opinions on possible changes!
The text was updated successfully, but these errors were encountered: