Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gdasstage_ic and gdasfcst_seg0 disagree on staged filenames for ocean restarts #2865

Open
jswhit opened this issue Aug 27, 2024 · 15 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@jswhit
Copy link
Contributor

jswhit commented Aug 27, 2024

What is wrong?

for a coupled 3dVar cycling experiment with cold starts for 2021032400 gdasstage_ic stages oceans restarts with m_prefix = 20210323.180000, but gdasfcst_seg0 then looks for restarts with m_prefix = 20210324.0000. Here's the error from gdasfcst_sego:

/bin/cp -p /scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C192coupled3dvar_test/gdas.20210323/18//model/ocean/restart/20210324.000000.MOM.res.nc /scratch1/NCEPDEV/stmp2/Jeffrey.S.Whitaker/RUNDIRS/C192coupled3dvar_test/gdas.2021032400/gdasfcst.2021032400/fcst.1385029/INPUT/MOM.res.nc
/bin/cp: cannot stat '/scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C192coupled3dvar_test/gdas.20210323/18//model/ocean/restart/20210324.000000.MOM.res.nc': No such file or directory

and the relevant output from gdasstage_ic:

[[38;21m2024-08-27 15:16:21,927 - INFO     - file_utils  : Created /scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C192coupled3dvar_test/gdas.20210323/18//model/ocean/restart^[[0m
^[[38;21m2024-08-27 15:16:27,068 - INFO     - file_utils  : Copied /scratch2/BMC/gsienkf/whitaker/replayics/C192mx025//gdas.20210323/18/model/ocean/restart/20210323.210000.MOM.res.nc to /scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C192coupled3dvar_test/gdas.20210323/18//model/ocean/restart^[[0m
^[[38;21m2024-08-27 15:16:33,344 - INFO     - file_utils  : Copied /scratch2/BMC/gsienkf/whitaker/replayics/C192mx025//gdas.20210323/18/model/ocean/restart/20210323.210000.MOM.res_1.nc to /scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C192coupled3dvar_test/gdas.20210323/18//model/ocean/restart^[[0m
^[[38;21m2024-08-27 15:16:40,148 - INFO     - file_utils  : Copied /scratch2/BMC/gsienkf/whitaker/replayics/C192mx025//gdas.20210323/18/model/ocean/restart/20210323.210000.MOM.res_2.nc to /scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C192coupled3dvar_test/gdas.20210323/18//model/ocean/restart^[[0m
^[[38;21m2024-08-27 15:16:41,466 - INFO     - file_utils  : Copied /scratch2/BMC/gsienkf/whitaker/replayics/C192mx025//gdas.20210323/18/model/ocean/restart/20210323.210000.MOM.res_3.nc to /scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C192coupled3dvar_test/gdas.20210323/18//model/ocean/restart^[[0m

What should have happened?

gdasstage_ic stages ocean restarts with the same filenames expected by gdasfcst_seg0 for cold starts.

What machines are impacted?

Hera

Steps to reproduce

run a cycled 3DVar coupled atm/ocean experiment. I ran

pslot="C192coupled3dvar_test" HPC_ACCOUNT="gsienkf" RUNTESTS="/scratch2/BMC/gsienkf/whitaker/GWTESTS" ICSDIR_ROOT="/scratch2/BMC/gsienkf/whitaker/replayics/C192mx025/"  ./workflow/create_experiment.py --yaml ci/cases/pr/C192mx025_3DVarAOWCDA.yaml

from /scratch2/BMC/gsienkf/whitaker/global-workflow-jswhit, with the patch from issue #2864 applied so that gdasstage_icdoes not fail.

Additional information

N/A

Do you have a proposed solution?

no

@jswhit jswhit added bug Something isn't working triage Issues that are triage labels Aug 27, 2024
@jswhit jswhit changed the title gdasstage_ic and gdasfcst_seg0n disagree on staged filenames for ocean restarts gdasstage_ic and gdasfcst_seg0 disagree on staged filenames for ocean restarts Aug 27, 2024
@KateFriedman-NOAA KateFriedman-NOAA self-assigned this Aug 27, 2024
@KateFriedman-NOAA KateFriedman-NOAA removed the triage Issues that are triage label Aug 27, 2024
@KateFriedman-NOAA
Copy link
Member

Will work on this, thanks for reporting @jswhit !

@KateFriedman-NOAA
Copy link
Member

Alrighty, so the gdasstage_ic job picks up your ICs "correctly" using m_prefix=20210323.210000, which is based on model_start_date_current_cycle minus 3hrs because DOIAU=YES. The gdasfcst_seg0 job then initially sets model_start_date_current_cycle to the same time (from a log from my reproduction of the issue):

1453 + forecast_predet.sh[94]: model_start_date_current_cycle=2021032321

...but later on it gets set to the cycle that's running because the experiment is cold-starting, which means IAU is off and the model start date would not be 3hrs earlier:

1907 + forecast_det.sh[27]: model_start_date_current_cycle=2021032400

That happens in forecast_det.sh here: https://github.com/NOAA-EMC/global-workflow/blob/develop/ush/forecast_det.sh#L27

Based on the above, either:

  1. the ocean ICs need to have the non-IAU model start date and the staging job needs an adjustment for cold-start for the ocean ICs
  2. the forecast job needs to be updated to handle the ocean restarts differently (treat them like a warm start while treating the atmosphere ICs for cold start)

Pretty sure option 2 is what is needed. Thoughts?

@jswhit
Copy link
Contributor Author

jswhit commented Aug 27, 2024

I believe 2 was how things worked before

@jswhit2
Copy link
Contributor

jswhit2 commented Aug 27, 2024

FWIW, this fixes my particular case (cold start for atmosphere, warm starts for ocean/ice)

diff --git a/ush/forecast_postdet.sh b/ush/forecast_postdet.sh
index 8af90549..2adf1aa1 100755
--- a/ush/forecast_postdet.sh
+++ b/ush/forecast_postdet.sh
@@ -415,7 +415,8 @@ MOM6_postdet() {
     restart_date="${RERUN_DATE}"
   else  # "${RERUN}" == "NO"
     restart_dir="${COMIN_OCEAN_RESTART_PREV}"
-    restart_date="${model_start_date_current_cycle}"
+    #restart_date="${model_start_date_current_cycle}"
+    restart_date="${current_cycle_begin}"
   fi

   # Copy MOM6 ICs
@@ -565,7 +566,8 @@ CICE_postdet() {
     seconds=$(to_seconds "${restart_date:8:2}0000")  # convert HHMMSS to seconds
     cice_restart_file="${DATArestart}/CICE_RESTART/cice_model.res.${restart_date:0:4}-${restart_date:4:2}-${restart_date:6:2}-${seconds}.nc"
   else  # "${RERUN}" == "NO"
-    restart_date="${model_start_date_current_cycle}"
+    #restart_date="${model_start_date_current_cycle}"
+    restart_date="${current_cycle_begin}"
     cice_restart_file="${COMIN_ICE_RESTART_PREV}/${restart_date:0:8}.${restart_date:8:2}0000.cice_model.res.nc"
     if [[ "${DO_JEDIOCNVAR:-NO}" == "YES" ]]; then
       cice_restart_file="${COMIN_ICE_ANALYSIS}/${restart_date:0:8}.${restart_date:8:2}0000.cice_model_anl.res.nc"

@KateFriedman-NOAA
Copy link
Member

Good to know, thanks @jswhit ! Didn't get a chance to look deep into this yesterday, will aim to today.

@KateFriedman-NOAA
Copy link
Member

KateFriedman-NOAA commented Sep 6, 2024

@jswhit I see now that the staging needed adjusting. When I tested it it worked but I see now that you had symlinks from the 20210323.210000.MOM.res*.nc files to the correct 20210324.000000.MOM.res*.nc files so it was a false success for me. I updated the staging yaml files to fix the issue in issue #2890 and it seems to have fixed things for the staging job in this case too.

I just ran the gdasstage_ic and gdasfcst_seg0 job for your case and they worked. Would you mind copying the yaml from my clone on Hera (/scratch1/NCEPDEV/global/Kate.Friedman/git/develop_fork/parm/stage) into your clone's parm/stage folder and try the staging and fcst jobs for your case? Let me know if it works as anticipated. Thanks!

KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Sep 12, 2024
- Create specific variables for atmos, ice, mediator, ocean,
and wave cycle dates and update scripts/staging yamls to use them.
- Also resolve issue with incorrect IC filenames caused by not
turning IAU off for cold-start.

Refs NOAA-EMC#2865
Refs NOAA-EMC#2890
@jswhit
Copy link
Contributor Author

jswhit commented Sep 12, 2024

sorry for the late reply @KateFriedman-NOAA. When I copy your parm/stage directory, the staging job seems to run fine but I'm getting this error in gdasfcst_seg0.log. I think it's probably unrelated to this issue, but I don't seem to have a sorc/upp.fd/parm/gfs directory (which parm/post/gfs is symlinked to).

+ forecast_predet.sh[544]: /bin/cp -p /scratch2/BMC/gsienkf/whitaker/global-workflow-jswhit2/parm/post/gfs/postxconfig-NT-gfs-two.txt /scratch1/NCEPDEV/stmp2/Jeffrey.S.Whitaker/RUNDIRS/C96coupled3dvar_test/gdas.2021032400/gdasfcst.2021032400/fcst.998793/postxconfig-NT.txt
/bin/cp: cannot stat '/scratch2/BMC/gsienkf/whitaker/global-workflow-jswhit2/parm/post/gfs/postxconfig-NT-gfs-two.txt': No such file or directory

@KateFriedman-NOAA
Copy link
Member

@jswhit There was an update to the system related to UPP and its parm txt files so you'll either want to make a fresh clone or do a submodule update command (and then link script) in your clone to remedy the issue.

@jswhit
Copy link
Contributor Author

jswhit commented Sep 13, 2024

Okay, got the submodules updated correctly. Now I get this error

 forecast_postdet.sh[440]: /bin/cp -p /scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C96coupled3dvar_test/gdas.20210323/18//model/ocean/restart/20210324.000000.MOM.res.nc /scratch1/NCEPDEV/stmp2/Jeffrey.S.Whitaker/RUNDIRS/C96coupled3dvar_test/gdas.2021032400/gdasfcst.2021032400/fcst.1355321/INPUT/MOM.res.nc
/bin/cp: cannot stat '/scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C96coupled3dvar_test/gdas.20210323/18//model/ocean/restart/20210324.000000.MOM.res.nc': No such file or directory
+ forecast_postdet.sh[441]: echo 'FATAL ERROR: Unable to copy MOM6 IC, ABORT!'
FATAL ERROR: Unable to copy MOM6 IC, ABORT!
+ forecast_postdet.sh[441]: exit 1

in gdasfcst_seg0 (see /scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C96coupled3dvar_test/logs/2021032400)

The file that was staged is 20210323.210000.MOM.res.nc, not 20210324.000000.MOM.res.nc

@KateFriedman-NOAA
Copy link
Member

@jswhit Ok so I traced the real issue back to the IAU not being turned off for the first half cycle so 3hrs is taken from the filename when it shouldn't. The PR I have open adds export DOIAU="NO" to an if-block in config.base that changes the other IAU variables to how they should be set when IAU is off. The IAU is on for the experiment but for the cold-started half cycle it should be turned off. Add the following to your config.base (see line 400 in the following code block):

398 # Check if cycle is cold starting, DOIAU off, or free-forecast mode
399 if [[ "${MODE}" = "cycled" && "${SDATE}" = "${PDY}${cyc}" && ${EXP_WARM_START} = ".false." ]] || [[ "${DOIAU}" = "NO" ]] || [[ "${MODE}" = "forecast-only" && ${EXP_WAR    M_START} = ".false." ]] ; then
400   export DOIAU="NO"
401   export IAU_OFFSET=0
402   export IAU_FHROT=0
403   export IAUFHRS="6,"
404 fi

Then rerun the staging and fcst jobs again. Let me know how it goes.

@jswhit2
Copy link
Contributor

jswhit2 commented Sep 17, 2024

that fixed it - thanks @KateFriedman-NOAA

@jswhit2
Copy link
Contributor

jswhit2 commented Sep 17, 2024

however, the first gdasocnanalprep step fails to find 20210324.030000.cice_model.res.nc (20210324.060000.cice_model.res.nc was staged). Logs in /scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C192coupled3dvar_test/logs/2021032406

@jswhit2
Copy link
Contributor

jswhit2 commented Sep 17, 2024

A similar problem with the snow analysis (gdassnowanl), 20210324.060000.sfc_data.tile*nc is staged but the snow analysis is looking for 20210324.030000.sfc_data.tile*nc

@KateFriedman-NOAA
Copy link
Member

however, the first gdasocnanalprep step fails to find 20210324.030000.cice_model.res.nc (20210324.060000.cice_model.res.nc was staged). Logs in /scratch2/BMC/gsienkf/whitaker/GWTESTS/COMROOT/C192coupled3dvar_test/logs/2021032406
A similar problem with the snow analysis (gdassnowanl), 20210324.060000.sfc_data.tilenc is staged but the snow analysis is looking for 20210324.030000.sfc_data.tilenc

@jswhit In both instances, 3hrs (half the assimilation window) is being taken off of the date in those filenames when it shouldn't. I looked at the latest logs for those two jobs and don't see DOIAU="NO" being set. Did you retry those jobs after adding export DOIAU="NO" to that if-block in your config.base?

@jswhit2
Copy link
Contributor

jswhit2 commented Sep 19, 2024

@KateFriedman-NOAA DOIAU="NO" is only set for the first cycle (which only runs the forecast). The first time the DA runs, DOIAU=YES and it expects restarts at the beginning of the window (not the middle). I think the staging logic still needs some tweaking to make sure that filenames are correct when there is a cold start, but the IAU is turned on.

KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Sep 30, 2024
- Move DOIAU="NO" to config.stage_ic
- Add similar condition check to forecast_predet.sh

Refs NOAA-EMC#2865
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants