Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some atmos_ensstat jobs failing for GEFS C384 runs #2856

Open
EricSinsky-NOAA opened this issue Aug 22, 2024 · 2 comments
Open

Some atmos_ensstat jobs failing for GEFS C384 runs #2856

EricSinsky-NOAA opened this issue Aug 22, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@EricSinsky-NOAA
Copy link
Contributor

EricSinsky-NOAA commented Aug 22, 2024

What is wrong?

Some atmos_ensstat jobs are failing when running the GEFS C384 with at least 2 memebers. A log file from a failed atmos_ensstat task can be found on WCOSS2: /lfs/h2/emc/ptmp/eric.sinsky/GEFS/COMROOT/customexp/gw_pr2788/logs/2021090900/atmos_ensstat_f003.log. The error seems to be due to a missing file that occurs for some mpmd jobs (there is a mpmd task for each product resolution).
A sample mpmd log file from a failed mpmd task can be found here: /lfs/h2/emc/stmp/eric.sinsky/RUNDIRS/gw_pr2788/gefs.2021090900/atmos_ensstat.219722/mpmd.1.out.
A sample mpmd log file from a successful mpmd task can be found here: /lfs/h2/emc/stmp/eric.sinsky/RUNDIRS/gw_pr2788/gefs.2021090900/atmos_ensstat.219722/mpmd.0.out

What should have happened?

All atmos_ensstat jobs should succeed when running C384 GEFS.

What machines are impacted?

WCOSS2

Steps to reproduce

  1. Set up a GEFS test case with at least 2 members and with an FHOUT_HF_GFS of 3.
  2. Run a test case. When the atmos_ensstat jobs run, not all these jobs will succeed.

Additional information

Some (not all) atmos_ensstat jobs failed running with and without replay ICs. This issue seems to not occur for C48 runs, which explains why this bug does not affect the GEFS CI tests. So far it has been found to only affect C384 GEFS runs. This issue has been occurring since the atmos_ensstat task has been added to the global workflow. Investigation of this issue has been ongoing, but the root cause has not been found yet.

Do you have a proposed solution?

No solution yet.

@EricSinsky-NOAA EricSinsky-NOAA added bug Something isn't working triage Issues that are triage labels Aug 22, 2024
@EricSinsky-NOAA
Copy link
Contributor Author

Update: This bug may have to do with setting FHOUT_HF_GFS to 3 and may not be due to the model resolution. atmos_ensstat jobs at forecast hours that are divisible by 6 are successful.

@EricSinsky-NOAA
Copy link
Contributor Author

EricSinsky-NOAA commented Aug 22, 2024

I found that this issue originates in the atmos_prod task. In parm/config/gefs/config.atmos_products, FHOUT_PGBS is equal to FHOUT_GFS by default. If FHOUT_GFS is equal to 6, then FHOUT_PGBS will also equal 6, which means that supplemental gfs pgb files at 1.0 and 0.5 deg will not be generated for f003, f009, f015, etc (when FHOUT_HF_GFS=3) . atmos_ensstat depends on pgrb files for 1.0 and 0.5 deg to be generated at f003, f009, f015, etc. otherwise atmos_ensstat will fail for f003, f009, f015, etc.

@WalterKolczynski-NOAA WalterKolczynski-NOAA removed the triage Issues that are triage label Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants