Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inactive top cell global ocean test case #164

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

cbegeman
Copy link
Collaborator

This test case is designed to determine the proper functioning of inactive top cells (i.e., minLevelCell > 1). The test follows the configuration of QU240/PHC/performance_test except an additional vertical layer is added and minLevelCell and maxLevelCell are shifted down the column by 1. A pass of this test constitutes bit-for-bit comparison with QU240/PHC/performance_test.

@pep8speaks
Copy link

pep8speaks commented Jun 23, 2021

Hello @cbegeman! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-08-03 06:11:07 UTC

@cbegeman
Copy link
Collaborator Author

@xylar At some point, it would be good to get your feedback on this but no rush. Both init and performance_test run, the latter only with tendency terms disabled. performance_test fails validation state and I'm still looking into why this is.

@cbegeman cbegeman force-pushed the add_inactive_top_cell branch 2 times, most recently from 402c058 to 8b5bd8c Compare June 24, 2021 15:22
@xylar
Copy link
Collaborator

xylar commented Jun 24, 2021

Wow, @cbegeman, I'll take a more careful look at this soon but it looks really great so far!

@cbegeman
Copy link
Collaborator Author

Using this test case and a branch with related MPAS-Ocean bugfixes, I have located two terms that lead to a non-bfb solution between minLevelCell = 1 and minLevelCell = 2 on chrysalis, intel-mpi:

  • config_disable_vel_pgrad = .false. for Jacobian_from_TS or Jacobian_from_density
  • config_disable_vel_surface_stress = .false.

I haven't yet found any obvious bugs in the relevant code.

@xylar xylar added ocean python package DEPRECATED: PRs and Issues involving the python package (master branch) labels Jul 7, 2021
@xylar xylar added the in progress This PR is not ready for review or merging label Jul 7, 2021
@cbegeman
Copy link
Collaborator Author

cbegeman commented Jul 7, 2021

@xylar I think this test case is working as expected. Do you want to give it a look and let me know if other changes or testing are needed? Still no rush on this.

@cbegeman cbegeman requested a review from xylar July 7, 2021 14:34
@xylar
Copy link
Collaborator

xylar commented Jul 7, 2021

Yep, I'll take a look soon. I was hoping to look at this today but didn't get to it yet.

@xylar
Copy link
Collaborator

xylar commented Jul 17, 2021

@cbegeman, this is really great!

I tested a merge of this branch with #171, together with your branch https://github.com/cbegeman/E3SM/tree/ocn/bugfix-inactive-top-cells merged with E3SM/master with gnu compilers on Ubuntu. Everything ran fine with QU240.

For the QUwISC240 test case, I got NaNs in layerThickness during the SSH adjustment phase. I didn't look into it too thoroughly but I suspect that there's some debugging still to be done there. I ran the version without inactive top cells just to make sure it worked okay, and it did.

I have a suggested addition. I would be really nice to have another test case under global ocean for each configuration with inactive top cells. It would only work if you had run the init and performance tests both with and without inactive top cells, so it would be helpful to have test suites defined for this check. (The QU240 version could also be included in the nightly test suite.)
This test case would only have one step. First, it would use xarray to remove the top layer from both init/initial_state/initial_state.nc and performance_test/forward/output.nc and save the results in new files local to the step. Then, it would compare to make sure the cropped files are identical to the results form the run without inactive top cells using the usual validation method (compass.validate.compare_variables()).

If results are not identical when comparing initial_state.nc in this way, we may have to see if the 1D vertical coordinates are identical except for the top layer. It could be that they are only close.

My other request would be to clean up the PEP8 issues that are mentioned here #164 (comment) and which will update automatically as you push changes to this PR branch.

@xylar
Copy link
Collaborator

xylar commented Jul 17, 2021

When I run in debug mode, I'm seeing errors in initial_state. Maybe we can include a fix for this in https://github.com/cbegeman/E3SM/tree/ocn/bugfix-inactive-top-cells?

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fd554d2220f in ???
#1  0x563c7dc8277e in __ocn_init_vertical_grids_MOD_ocn_compute_haney_number
	at /home/xylar/code/compass/add_inactive_top_cell/bugfix-inactive-top-cells-debug/components/mpas-ocean/src/mode_init/mpas_ocn_init_vertical_grids.F:795
#2  0x563c7d970552 in __ocn_init_global_ocean_MOD_ocn_init_setup_global_ocean
	at /home/xylar/code/compass/add_inactive_top_cell/bugfix-inactive-top-cells-debug/components/mpas-ocean/src/mode_init/mpas_ocn_init_global_ocean.F:511
#3  0x563c7d94b84a in __ocn_init_mode_MOD_ocn_init_mode_run
	at /home/xylar/code/compass/add_inactive_top_cell/bugfix-inactive-top-cells-debug/components/mpas-ocean/src/mode_init/mpas_ocn_init_mode.F:307
#4  0x563c7d7f9a24 in __ocn_core_MOD_ocn_core_run
	at /home/xylar/code/compass/add_inactive_top_cell/bugfix-inactive-top-cells-debug/components/mpas-ocean/src/driver/mpas_ocn_core.F:116
#5  0x563c7d257990 in __mpas_subdriver_MOD_mpas_run
	at /home/xylar/code/compass/add_inactive_top_cell/bugfix-inactive-top-cells-debug/components/mpas-framework/src/driver/mpas_subdriver.F:358
#6  0x563c7d256689 in mpas
	at /home/xylar/code/compass/add_inactive_top_cell/bugfix-inactive-top-cells-debug/components/mpas-framework/src/driver/mpas.F:20
#7  0x563c7d2566fa in main
	at /home/xylar/code/compass/add_inactive_top_cell/bugfix-inactive-top-cells-debug/components/mpas-framework/src/driver/mpas.F:10

@xylar
Copy link
Collaborator

xylar commented Jul 17, 2021

Actually, regarding my last comment, I think the issue is that the top layer has zero thickness. So the issue isn't to do with init mode in MPAS-Ocean at all, but rather that the 1D vertical coordinate it's being given isn't valid. Could we give the 1D vertical coordinate a non-zero, positive value for the first entry so it doesn't cause divide-by-zero issues? Or does that cause trouble with computations elsewhere?

@cbegeman
Copy link
Collaborator Author

@xylar Thanks for all your feedback and suggestions.

Is there a reason that you propose adding an additional test case that does the xarray trim and then validate, rather than adding a new method or function crop_output (or remove_inactive_top_cells_output) and adding

if self.init.with_inactive_top_cells:
     crop_output(test_case=self, variables=variables, filename1=forward/output.nc)

to either performance_test/__init__.py (called from validate, new method in PerformanceTest class) or to validate.py (called from compare_variables, new function in compass.validate). I'm thinking that it would be desirable to just run the inactive cells init and performance tests and have the performance test do the comparison with baseline automatically if the baseline is provided.

@xylar
Copy link
Collaborator

xylar commented Jul 19, 2021

@cbegeman, that would be a good way to go. I hadn't thought about it. I would suggest adding a validate to the Init class as well that does the same thing. So it would make sense to make remove_inactive_top_cells_output() (the longer name is better in my view) live somewhere outside of any individual test case or step in the global_ocean infrastructure.

Presumably, you will just need to check whether the files PHC/init/initial_state/initial_state.nc and PHC/performance_test/forward/output.nc are present, and decide whether to perform each validation only if so? I don't think there's currently a good way to know whether those steps got run except by whether their output exists.

@xylar
Copy link
Collaborator

xylar commented Jul 19, 2021

@cbegeman, I thought about this a bit further. We don't usually put any output files at the test-case level, just within steps. So it would be better if the output gets cropped as part of the init/initial_state and performance_test/forward steps, respectively, and the cropped output ends up in the directories for those steps. This will become more important when we implement task parallelism, as it is an important aspect of determining data dependencies. It would also be important that the cropped files are included as outputs of the steps.

@cbegeman
Copy link
Collaborator Author

@xylar Thanks for giving it some thought. I think I understand how to implement it as you suggested.

Do you mind if I throw the remove_inactive_top_cells_output() function in validate.py and I'll just import it for the two steps you mentioned?

@xylar
Copy link
Collaborator

xylar commented Jul 19, 2021

Do you mind if I throw the remove_inactive_top_cells_output() function in validate.py and I'll just import it for the two steps you mentioned?

Hmm, that doesn't really seem like the right place or it to me. validate.py is general infrastructure for all of compass, whereas what you are proposing belongs within the ocean package for sure and probably within the global_ocean test group unless it would conceivably get used somewhere else. I would suggest creating a new module either in ocean or global_ocean for this function. I don't think there is an existing module you can use without it seeming oddly out of place.

@cbegeman
Copy link
Collaborator Author

@xylar You can ignore this while on vacation.

I didn't know the best way to set up a comparison with the baseline because the baseline has a different test name than the inactive cell run (e.g., PHC vs. PHC_inactive_top). Right now, the comparison requires that you copy the baseline files into the output directories for PHC_inactive_top tests, which isn't ideal.

Other than that I think the test is good to go, and I added the check for zero layer thickness in init mode to that bugfix branch.

@xylar
Copy link
Collaborator

xylar commented Jul 21, 2021

@cbegeman, thanks for working on this. I think I see the dilemma: validation was only designed either for comparison with a baseline or within a test case, not between test cases. I'll give it some thought and try to come up with a better way when I come back.

@xylar
Copy link
Collaborator

xylar commented Jul 29, 2021

@cbegeman, I'm getting back to thinking about the validation problem here. I'm going to try to come up with a general solution outside of this branch/PR so we can rebase this onto that work.

@cbegeman
Copy link
Collaborator Author

Thanks, @xylar!

@xylar xylar self-assigned this Mar 29, 2023
@cbegeman cbegeman closed this Aug 23, 2023
@cbegeman cbegeman deleted the add_inactive_top_cell branch August 23, 2023 20:27
@cbegeman cbegeman restored the add_inactive_top_cell branch August 24, 2023 15:29
@cbegeman cbegeman reopened this Aug 24, 2023
@cbegeman
Copy link
Collaborator Author

cbegeman commented Aug 24, 2023

I have located two terms that lead to a non-bfb solution between minLevelCell = 1 and minLevelCell = 2 on chrysalis, intel-mpi:

  • config_disable_vel_pgrad = .false. for Jacobian_from_TS or Jacobian_from_density
  • config_disable_vel_surface_stress = .false.

@xylar Ok, let's merge this PR soon with the two terms that cause the non-BFB result disabled so it is BFB. I'll re-test and hopefully push that change shortly.

@xylar
Copy link
Collaborator

xylar commented Aug 24, 2023

Sounds good!

@xylar
Copy link
Collaborator

xylar commented Aug 24, 2023

@cbegeman, the rebase may be challenging at this point. If you want me to do it (or parts of it), let me know.

cbegeman and others added 13 commits August 24, 2023 16:28
Use `minLevelCell` instead of an argument to determine the number
of inactive top levels.
Store the `inactive_top_comp_subdir` so we know what files to
compare with during validation.

Use `ds.load()` when modifying `minLevelCell` in the initial state
to prevent issues with reading from and writing to the same file.

Crop inactive top cells after writing out the modified initial
state (because we now use the modified `minLevelCell`).

Set the default to `inactive_top_cells = 0` in a new `init.cfg`
instead of the config file for the QU240 mesh.
Move cropping of inactive top cells to ForwardStep so it is
explicitly included as an output of that step.

Give `init.nc` as the mesh file for cropping.

For validation, point to the output file from the version of the
performance test without inactive top cells for `filename2`.
@cbegeman
Copy link
Collaborator Author

@xylar I've finished the rebase and made a few changes to how the case works. I think it makes more sense to have performance_test_inactive_cells and init_inactive_cells which are compared to performance_test and init if those are run. Formerly, these tests were set up at {initial_condition}_inactive_cells. The consequence of this is that it isn't convenient to have different namelist settings for the inactive cells test because they have to be modified at both performance_test and performance_test_inactive_cells to make the comparison valid. Should we wait to merge this PR until we can get PASS for the comparison with the default namelist settings?

There is one lingering issue I spotted but didn't get around to fixing. The initial condition ends up with real values at k=1:minLevelCell-1. It doesn't affect the test outcome because those values are not used, but I think we should set them to the fill value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress This PR is not ready for review or merging ocean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants