Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C768 gdasfcst runs too slow on WCOSS2 #2891

Closed
DavidHuber-NOAA opened this issue Sep 5, 2024 · 38 comments · Fixed by #2914
Closed

C768 gdasfcst runs too slow on WCOSS2 #2891

DavidHuber-NOAA opened this issue Sep 5, 2024 · 38 comments · Fixed by #2914
Assignees
Labels
bug Something isn't working

Comments

@DavidHuber-NOAA
Copy link
Contributor

DavidHuber-NOAA commented Sep 5, 2024

What is wrong?

The C768 gdas forecast takes much longer than expected to run on WCOSS2 (tested on dogwood). Runtime exceeded 70 minutes with the current configuration with the bulk of the time spent in writing the inline post and atm forecast files. Interestingly, the inline post on odd forecast hours and f000 only took ~30s while the inline post at even hours took closer to 360s. Increasing WRTTASK_PER_GROUP_PER_THREAD_PER_TILE_GDAS from 10 to 15 actually slowed down the inline post write times on even hours to ~420s, though the odd hours' inline posts ran faster (~20s).

This is not an issue on Hera. I have not tested on other machines.

What should have happened?

Runtime should be less than 40 minutes.

What machines are impacted?

WCOSS2

Steps to reproduce

  1. Create a cycled experiment (tested on the 2021122018 half cycle)
  2. Run the first gdasfcst

Additional information

Discovered while testing #2819.

Do you have a proposed solution?

Re-test after the UPP update coming in PR #2877.

@DavidHuber-NOAA DavidHuber-NOAA added bug Something isn't working triage Issues that are triage labels Sep 5, 2024
@DavidHuber-NOAA DavidHuber-NOAA self-assigned this Sep 5, 2024
@DavidHuber-NOAA DavidHuber-NOAA removed the triage Issues that are triage label Sep 5, 2024
@NOAA-EMC NOAA-EMC deleted a comment Sep 5, 2024
@DavidHuber-NOAA
Copy link
Contributor Author

Updating to the newest UPP did not resolve this issue. More investigation will be required.

@DavidHuber-NOAA DavidHuber-NOAA removed their assignment Sep 10, 2024
@DavidHuber-NOAA
Copy link
Contributor Author

@WenMeng-NOAA @junwang-noaa While testing #2819 on WCOSS2, I found that the first C768 half-cycle, ATM-only GDAS forecast ran very slowly. When running on Hera, the forecast took a little over 20 minutes while on dogwood it took closer to 70 minutes. The slowdown seems to be coming from the inline post. On dogwood, the inline post runtime was ~20s for the 0-hour and all odd-hour writes, but over 6-minutes on even-hour writes. On Hera, the inline post executed in less than 30s at all write times.

Would you be able to look into this? I have initial conditions available on Dogwood here: /lfs/h2/emc/global/noscrub/David.Huber/keep/global_ICs/768/2021122018.

@CatherineThomas-NOAA
Copy link
Contributor

@RuiyuSun has also experienced this slowdown for the HR4 scout runs at C1152. The 16 day forecast does not complete within 10 hours walltime.

@WenMeng-NOAA
Copy link
Contributor

@WenMeng-NOAA @junwang-noaa While testing #2819 on WCOSS2, I found that the first C768 half-cycle, ATM-only GDAS forecast ran very slowly. When running on Hera, the forecast took a little over 20 minutes while on dogwood it took closer to 70 minutes. The slowdown seems to be coming from the inline post. On dogwood, the inline post runtime was ~20s for the 0-hour and all odd-hour writes, but over 6-minutes on even-hour writes. On Hera, the inline post executed in less than 30s at all write times.

Would you be able to look into this? I have initial conditions available on Dogwood here: /lfs/h2/emc/global/noscrub/David.Huber/keep/global_ICs/768/2021122018.

@DavidHuber-NOAA Do you have runtime logs saved?

@DavidHuber-NOAA
Copy link
Contributor Author

@WenMeng-NOAA Yes, I have a partial log here: /lfs/h2/emc/global/noscrub/David.Huber/para/COMROOT/dev_768_upp/logs/2021122018/gdasfcst_seg0.log. It is partial because it ran into the walltime limit of 40 minutes.

I also have a complete log here: /lfs/h2/emc/global/noscrub/David.Huber/para/COMROOT/768_768/logs/2021122018/gdasfcst_seg0.log. However, for this test, I increased the number of write tasks by 1.5x, which actually slowed the inline post down further.

Lastly, I have a Hera log here: /scratch1/NCEPDEV/global/David.Huber/para/COMROOT/C768_2/logs/2023021018/gdasfcst.log.

@RuiyuSun
Copy link

I was able to complete a 120 hour coupled HR4 forecast experiment. The log files is at /lfs/h2/emc/stmp/ruiyu.sun/ROTDIRS/HR47/logs/2020012600 on dogwood.

@DavidHuber-NOAA
Copy link
Contributor Author

I should clarify that this issue was only present for me for the GDAS forecast. The 120 hour ATM-only GFS forecast did not exhibit this issue.

@RussTreadon-NOAA
Copy link
Contributor

@DavidHuber-NOAA , I see that the model is now writing both gaussian grid [atmfxxx, sfcfxxx] as well as cubed_spehere_grid [atmfxxx, sfcfxxx] files. Writing more output takes more time. Can we reduce i/o time by adjusting WRITE_GROUP or WRTTASK_PER_GROUP_PER_THREAD_PER_TILE?

@DavidHuber-NOAA
Copy link
Contributor Author

@RussTreadon-NOAA I did try increasing WRTTASK_PER_GROUP_PER_THREAD_PER_TILE from 10 to 15 (which is now what is in develop), but the posts actually ran slower. Increasing the WRITE_GROUP would be a good choice to look at next.

@WenMeng-NOAA
Copy link
Contributor

@DavidHuber-NOAA Could you try to modify setting of WRTASK_PER_GROUP? Could you keep the run directory for @junwang-noaa and me to check inline post?

@junwang-noaa
Copy link
Contributor

junwang-noaa commented Sep 11, 2024

@RussTreadon-NOAA Thanks for finding the issue! @DavidHuber-NOAA Is it required to write out 2 sets of history files on Gaussian grid and on native grid outputs? What is the native grid output used for? This doubles the memory requirement on the IO side. Also I want to confirm this configuration (2 sets of history files) could actually cause IO issue on all the platforms unless the machine has huge memory.

@RuiyuSun
Copy link

I should clarify that this issue was only present for me for the GDAS forecast. The 120 hour ATM-only GFS forecast did not exhibit this issue.

@DavidHuber-NOAA GFS fcst is slow too in the coupled configuration. My HR4 GFS forecast experiment didn't completed in 10 hour walltime. Layout_x_gfs=24 and layout_y_gfs=16 were used in this run.
=>> PBS: job killed: walltime 36058 exceeded limit 36000

The log file is gfsfcst_seg0.log.0 at /lfs/h2/emc/ptmp/ruiyu.sun/ROTDIRS/HR46/logs/2020012600.

@RuiyuSun
Copy link

FHMAX_GFS=384 in the experiment

@WenMeng-NOAA
Copy link
Contributor

@RuiyuSun From the log you provided at /lfs/h2/emc/stmp/ruiyu.sun/ROTDIRS/HR47/logs/2020012600/gfsfcst_seg0.log, I saw the following configurations:

parsing_model_configure_FV3.sh[30]: local WRITE_GROUP=4
+ parsing_model_configure_FV3.sh[31]: local WRTTASK_PER_GROUP=120
+ parsing_model_configure_FV3.sh[32]: local ITASKS=1
+ parsing_model_configure_FV3.sh[33]: local OUTPUT_HISTORY=.true.
+ parsing_model_configure_FV3.sh[34]: local HISTORY_FILE_ON_NATIVE_GRID=.true.
+ parsing_model_configure_FV3.sh[35]: local WRITE_DOPOST=.true.
+ parsing_model_configure_FV3.sh[36]: local WRITE_NSFLIP=.true.

@junwang-noaa Is 'HISTORY_FILE_ON_NATIVE_GRID' set for writing out model data files in native grid?

@RussTreadon-NOAA
Copy link
Contributor

g-w PR #2792 changed

local HISTORY_FILE_ON_NATIVE_GRID=".false."

to

local HISTORY_FILE_ON_NATIVE_GRID=".true."

in ush/parsing_model_configure_FV3.sh

At the same time, we retain

local OUTPUT_HISTORY=${OUTPUT_HISTORY:-".true."}

@RussTreadon-NOAA
Copy link
Contributor

As a test can we revert back to local HISTORY_FILE_ON_NATIVE_GRID=".false." in a working copy of ush/parsing_model_configure_FV3.sh and rerun a gdasfcst to see if/how the wall time changes?

@CoryMartin-NOAA
Copy link
Contributor

@RussTreadon-NOAA Thanks for finding the issue! @DavidHuber-NOAA Is it required to write out 2 sets of history files on Gaussian grid and on native grid outputs? What is the native grid output used for? This doubles the memory requirement on the IO side. Also I want to confirm this configuration (2 sets of history files) could actually cause IO issue on all the platforms unless the machine has huge memory.

@junwang-noaa we only need native grid history when we will be using JEDI for the atmospheric analysis. We will likely have to write both since the Gaussian grid is presumably used for products/downstream?

@junwang-noaa
Copy link
Contributor

junwang-noaa commented Sep 11, 2024

So now the write grid component will do:

  1. UPP
  2. Gaussian history files
  3. Native history files
  4. Restart files.
    The last two tasks require memory increase and slow down the write grid component, which could further slow down the forecast integration. So both write tasks per group and the number of write groups need to increase in order to catch up with the forecast.

@CoryMartin-NOAA
Copy link
Contributor

Since we only need native history for GDAS fcst (and enkfgdas fcst) when using JEDI for atm, and we don't need that for GFSv17, perhaps we either:

  • revert to gaussian only in develop
  • add an option to only use native grid output if DO_JEDIATMVAR="YES"

Later on, we may want to just write out native grid and regrid to Gaussian offline as needed?

@junwang-noaa
Copy link
Contributor

@CoryMartin-NOAA I want to confirm with you when you say "we only need native history for GDAS fcst", do you also need post products from the model? If yes, then we still need to have Gaussian grid fields on write grid component for inline post unless there is a plan to do cubed-sphere-grid to Gaussian grid interpolation and then offline post. We still increase the memory, but the writing time of the native history can be reduced.

@DavidHuber-NOAA @RuiyuSun I see you have following in the GFS forecast log:

quilting: .true.
quilting_restart: .true.
write_groups: 4
write_tasks_per_group: 120
itasks: 1
output_history: .true.
history_file_on_native_grid: .true.

So model are writing out 2 sets of C1152 history files, also since it is a coupled case, the quilting_restart can also be set to .false. because atm is waiting when other model components write out restart files. So please set the following:

quilting: .true.
quilting_restart: .false.
write_groups: 4
write_tasks_per_group: 120
itasks: 1
output_history: .true.
history_file_on_native_grid: .false.

@CoryMartin-NOAA
Copy link
Contributor

@junwang-noaa I'll have to defer to someone like @WenMeng-NOAA for that. I do think we have some 'GDAS' products but I'm not sure.

@WenMeng-NOAA
Copy link
Contributor

The gdas forecast products (e.g. gdas.tCCz.master.f and gdas.tCCz.sfluxgrbf*) are generated from inline post.

@RuiyuSun
Copy link

@junwang-noaa I see. Thanks for the suggestion.

@DavidHuber-NOAA
Copy link
Contributor Author

I ran a test case on WCOSS2 with local HISTORY_FILE_ON_NATIVE_GRID=".false.". The gdasfcst completed in ~21.5 minutes. The log file for this test is available here: /lfs/h2/emc/global/noscrub/David.Huber/keep/gdasfcst_no_native_history.log.

@junwang-noaa
Copy link
Contributor

@DavidHuber-NOAA Is it OK to turn off HISTORY_FILE_ON_NATIVE_GRID for GFSv17 implementation? are the 21.5 minutes within opn window? Also would you please send us the run directories on hera and wcoss so that we can investigate a little more?

@DavidHuber-NOAA
Copy link
Contributor Author

DavidHuber-NOAA commented Sep 11, 2024

@junwang-noaa I will defer the operational question to @aerorahul. Based on the discussion, I think turning off HISTORY_FILE_ON_NATIVE_GRID for GFSv17 would be the right way to go, but I will run a full cycle to verify.

Unfortunately, my run directories were removed automatically by the workflow. I don't think I can replicate the Hera run as I have updated my working version of the workflow, but I will regenerate the run directory on WCOSS2 at least and set KEEPDATA="YES" to prevent it from being deleted.

@CatherineThomas-NOAA
Copy link
Contributor

@junwang-noaa @DavidHuber-NOAA
We do not need the history files on the cubed sphere for GFSv17. It is only needed for JEDI atmospheric DA, which is not a v17 target at this time.

@junwang-noaa
Copy link
Contributor

Thanks, Cathy. Is the gdas fcst 21.5mins running time OK for the operational GFSv17?

@DavidNew-NOAA
Copy link
Contributor

DavidNew-NOAA commented Sep 11, 2024

Right now the native grid cubed-sphere history files are used as backgrounds for JEDI DA. Eventually (something I'm working on right now), they will be interpolated to the Gaussian grid during post-processing, and the forecast model will only need to write to the native grid, not both. Until then, I would agree that we should only turn HISTORY_FILE_ON_NATIVE_GRID on when using JEDI in the workflow.

@junwang-noaa
Copy link
Contributor

@DavidHuber-NOAA Thanks for the explanation. " they will be interpolated to the Gaussian grid during post-processing", do you mean that the post processing code will read in the native grid model output fields and interpolate these fields on Gaussian grid?

@DavidNew-NOAA
Copy link
Contributor

@junwang-noaa Yes, that's correct

@CatherineThomas-NOAA
Copy link
Contributor

@junwang-noaa: Yes, 21.5 minutes is very reasonable for the gdas forecast.

@junwang-noaa
Copy link
Contributor

@DavidHuber-NOAA So setting HISTORY_FILE_ON_NATIVE_GRID to .false. will resolve the slowness issue on gdas fcst and GFS fcst jobs on wcoss2 without significantly increasing the number of write tasks and write groups. Some work needs to be done as @DavidNew-NOAA mentioned to turn back on HISTORY_FILE_ON_NATIVE_GRID. Also I noticed the slowness of writing the native history files on wcoss2 ( a run directory from this test case would be helpful). We will look into it on the model side, but this is for future implementations when native model history files are required. Please let me know if there is still any issue . Thanks

@DavidHuber-NOAA
Copy link
Contributor Author

@junwang-noaa Thank you for the summary. I have copied the run directory run directory with HISTORY_FILE_ON_NATIVE_GRID disabled to /lfs/h2/emc/global/noscrub/David.Huber/keep/fcst_rundir_no_native_history. I will repeat this case with that option enabled and save the working directory and log file.

@DavidNew-NOAA @CoryMartin-NOAA Just to confirm, the native grid restart files are required for GDASApp analyses, correct? If so, I will add a conditional block around local HISTORY_FILE_ON_NATIVE_GRID=".true." for JEDI-based experiments.

@CoryMartin-NOAA
Copy link
Contributor

@DavidHuber-NOAA yes, that would be perfect if you could do that.

@DavidHuber-NOAA
Copy link
Contributor Author

Alright, sounds good @CoryMartin-NOAA.

@junwang-noaa I apologize. The gdasfcst for which I copied data to keep/ failed. I think I know the reason and will rerun shortly. I will let you know when I have finished running with both native grid outputs on and off.

@aerorahul
Copy link
Contributor

aerorahul commented Sep 12, 2024

@DavidHuber-NOAA did the work for PR #2914 2914. I just opened the PR after the GFSv17 meeting discussion to get eyes on it.

@DavidHuber-NOAA
Copy link
Contributor Author

@junwang-noaa the run directories and log files have now been copied to /lfs/h2/emc/global/noscrub/David.Huber/keep as the following

Writing both native and gaussian grids: gdasfcst_w_native_rundir and gdasfcst_w_native.log (58:00 runtime)
Writing only the gaussian grid: gdasfcst_no_native_rundir and gdasfcst_no_native.log (22:23 runtime)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants