Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Simplify the way the configuration of the vx is handled #1082

Merged

Conversation

gsketefian
Copy link
Collaborator

@gsketefian gsketefian commented May 9, 2024

DESCRIPTION OF CHANGES:

This PR removes the tasks parse_vx_config_[det|ens] that read in and parse the verification (vx) yaml configuration files (one for deterministic vx and another for ensemble vx). Currently, these tasks are used to call the script decouple_fcst_obs_vx_config.py to separate the "coupled" vx config information in the yaml files into forecast and observation components. For example, a "coupled" field name of "CRAIN%%PRWE" in the deterministic configuration file corresponds to a field name of "CRAIN" for forecasts and "PRWE" for the observations (and the "%%" is the delimiter). After separating each item (field name, level, or threshold) in this way, the parse_vx_config_[det|ens] tasks write this separated information into two new intermediate configuration files (one for deterministic and one for ensemble) as two separate dictionaries (one for forecasts and another for observations). These intermediate config files are then read in by the various vx task ex-scripts.

In this PR, the parse_vx_config_[det|ens] tasks and the decouple_fcst_obs_vx_config.py script are removed (so that the intermediate configuration files are no longer created). The separation into forecast and observation values of the "coupled" information in the vx configuration files is now performed in the jinja2 templates for the METplus configuration files, hiding these details from the user.

This PR was prompted by some of the questions by @mkavulich in PR #1005.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • hercules.intel
  • cheyenne.intel
  • cheyenne.gnu
  • derecho.intel
  • gaea.intel
  • gaeac5.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

The following vx WE2E tests were run on Hera:

MET_ensemble_verification
MET_ensemble_verification_only_vx
MET_ensemble_verification_only_vx_time_lag
MET_ensemble_verification_winter_wx
MET_verification
MET_verification_only_vx
MET_verification_winter_wx

They all completed successfully.

DOCUMENTATION:

No updates necessary.

ISSUE:

None.

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@mkavulich

…recast and observation values) deterministic vx yaml config file directly into ex-script for deterministic GridStat and PointStat tasks to form an overall yaml file from which to generate METplus config files from template.
…recast and observation values) ensemble vx yaml config file directly into ex-script for (ensemble) GenEnsProd tasks to form an overall yaml file from which to generate METplus config files from template.
… purely yaml ensemble vx configuration file.
…th forecast and observation values) ensemble vx yaml config file directly into ex-script for GridStat ensemble mean tasks to form an overall yaml file from which to generate METplus config files from template.
…th forecast and observation values) ensemble vx yaml config file directly into ex-script for GridStat ensemble probabilistic tasks to form an overall yaml file from which to generate METplus config files from template.
…th forecast and observation values) ensemble vx yaml config file directly into the ex-script for PointStat ensemble mean tasks to form an overall yaml file from which to generate METplus config files from the PointStat_ensmean.conf template.
…th forecast and observation values) ensemble vx yaml config file directly into the ex-script for PointStat ensemble probabilistic tasks to form an overall yaml file from which to generate METplus config files from the PointStat_ensprob.conf template.
…nce the parsing (of items in the vx configuration yaml files, where items can be field names, levels, and thresholds, to obtain the forecast and observation values) is now done within the METplus config file templates (i.e. it's done by Jinja).
@RatkoVasic-NOAA
Copy link
Collaborator

@gsketefian I ran tests on Hera. It failed for some tests (that passed for you before):
/scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs/WE2E_summary_20240509175500.txt

/scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs/MET_ensemble_verification_only_vx_time_lag/log/get_obs_ndas_2021050500.log

Can you please take a look, in second log fie, it is trying to get from HPSS file that does not exist.

@MichaelLueken
Copy link
Collaborator

@RatkoVasic-NOAA -

I saw similar issues with the MET_ensemble_verification_only_vx_time_lag and MET_verification_winter_wx WE2E tests. I believe that these were unavailable during the HPSS library maintenance today. Once the maintenance concluded, I rocotorewind/rocotoboot the get_verif_obs tasks and they successfully passed. Please try rerunning these tasks/tests now and see if they continue to fail.

@gsketefian
Copy link
Collaborator Author

@MichaelLueken @RatkoVasic-NOAA I just pushed a few more commits that further streamline the METplus config templates. I reran the test MET_ensemble_verification_only_vx, and it worked. This test checks all the latest changes, so the rest of the tests should work fine. Go ahead and retest at your convenience. Thanks.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsketefian -

The verification WE2E tests successfully ran through to completion on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
MET_ensemble_verification_only_vx_20240510132442                   COMPLETE               1.32
MET_ensemble_verification_only_vx_time_lag_20240510132445          COMPLETE               3.93
MET_ensemble_verification_winter_wx_20240510132448                 COMPLETE             107.94
MET_verification_20240510132449                                    COMPLETE              10.91
MET_verification_only_vx_20240510132451                            COMPLETE               0.23
MET_verification_winter_wx_20240510132452                          COMPLETE              17.66
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0_202405  COMPLETE              44.38
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024051013245  COMPLETE              20.15
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             206.52

Approving PR now.

@RatkoVasic-NOAA
Copy link
Collaborator

After two tries, it passed on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
MET_ensemble_verification_20240510145323                           COMPLETE              21.52
MET_ensemble_verification_only_vx_20240510145325                   COMPLETE               1.04
MET_ensemble_verification_only_vx_time_lag_20240510145328          COMPLETE               3.90
MET_ensemble_verification_winter_wx_20240510145331                 COMPLETE             112.10
MET_verification_20240510145332                                    COMPLETE              10.12
MET_verification_only_vx_20240510145334                            COMPLETE               0.23
MET_verification_winter_wx_20240510145335                          COMPLETE              16.54
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             165.45

Detailed summary written to /scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs/WE2E_summary_20240510152858.txt

@RatkoVasic-NOAA
Copy link
Collaborator

@gsketefian , can you please check what is going on with this one:

/scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs/MET_ensemble_verification_only_vx_time_lag/log/run_MET_GenEnsProd_vx_RETOP_2021050500.log
/scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs/MET_ensemble_verification_only_vx_time_lag/log/run_MET_GenEnsProd_vx_RETOP_2021050500.log.0

It is failing randomly. Yesterday it failed twice in a row causing rocoto to give up and whole test to fail.
Here is same and additional test that failed more than once:

Hera:/scratch2/NCEPDEV/fv3-cam/Ratko.Vasic/SRW-1082/expt_dirs-old/MET_ensemble_verification_only_vx_time_lag/log>
run_MET_GenEnsProd_vx_REFC_2021050500.log
run_MET_GenEnsProd_vx_REFC_2021050500.log.0
run_MET_GenEnsProd_vx_RETOP_2021050500.log
run_MET_GenEnsProd_vx_RETOP_2021050500.log.0

I wonder if you have an idea what might be causing this random occurrence?

@gsketefian
Copy link
Collaborator Author

@RatkoVasic-NOAA I think I figured out the problem.

What's happening is that during the run_MET_GenEnsProd_vx_REFC task (and also for RETOP, but I'll just talk about REFC for simplicity), in exregional_run_met_genensprod_or_ensemblestat.sh the call to set_vx_fhr_list is failing to find some or all of the obs files because the task get_obs_mrms has not yet generated these files (because it hasn't launched yet, or has not yet completed). But these obs files aren't necessary for running run_MET_GenEnsProd_vx_REFC because GenEnsProd only takes forecast files as inputs. The reason this check exists is that this ex-script is also used for the EnsembleStat tasks, which do take observation files as inputs.

There are a couple of quick-and-dirty (i.e. hard-coded) fixes for this, but I'd rather implement a more thorough fix. Since this is something that predates this PR and is not directly related, I'd like to create an issue and take a look next week. Are you ok with that? Otherwise I can do a one-liner quick fix.

@RatkoVasic-NOAA
Copy link
Collaborator

@gsketefian great! I suggest you do 'dirty' quick fix now (although not related to this PR), and open an issue for nice fix for later. @MichaelLueken what do you think? I'll approve this PR either way.

@MichaelLueken
Copy link
Collaborator

@gsketefian great! I suggest you do 'dirty' quick fix now (although not related to this PR), and open an issue for nice fix for later. @MichaelLueken what do you think? I'll approve this PR either way.

@RatkoVasic-NOAA I'd recommend opening an issue to properly address this issue rather than create a temporary fix. A quick-and-dirty fix could potentially break the way the workflow runs for certain tests, leading to more issues. I'd err on the side of caution and focus on a fix that will work for all verification tests.

@gsketefian Once the parm/metplus/STATAnalysisConfig_skill_score file has been added back and @EdwardSnyder-NOAA has approved, I will kick of the automated Jenkins tests.

…nkins pipeline when running the srw_metric.sh script).
@gsketefian
Copy link
Collaborator Author

@MichaelLueken @RatkoVasic-NOAA @EdwardSnyder-NOAA I added back the file STATAnalysisConfig_skill_score and created an issue at #1084. I think this PR is ready to (re)test. Thanks.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label May 10, 2024
@RatkoVasic-NOAA
Copy link
Collaborator

Except expected failures which are reported in issue #1084, tests passed on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
MET_ensemble_verification_20240510195653                           COMPLETE              24.54
MET_ensemble_verification_only_vx_20240510195655                   COMPLETE               1.04
MET_ensemble_verification_only_vx_time_lag_20240510195657          DEAD                   3.49
MET_ensemble_verification_winter_wx_20240510195700                 COMPLETE             122.83
MET_verification_20240510195702                                    COMPLETE              10.39
MET_verification_only_vx_20240510195703                            COMPLETE               0.23
MET_verification_winter_wx_20240510195705                          COMPLETE              17.69
----------------------------------------------------------------------------------------------------
Total                                                              DEAD                 180.21
MET_ensemble_verification_only_vx_time_lag/log/run_MET_GenEnsProd_vx_REFC_2021050500.log.0
MET_ensemble_verification_only_vx_time_lag/log/run_MET_GenEnsProd_vx_RETOP_2021050500.log.0

Approved.

@MichaelLueken
Copy link
Collaborator

The Jenkins tests successfully passed on all platforms. Moving forward with merging this work now.

@MichaelLueken MichaelLueken merged commit a712ef1 into ufs-community:develop May 13, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants