Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spherical padding and faster tests #45

Merged
merged 8 commits into from
Sep 20, 2024
Merged

Spherical padding and faster tests #45

merged 8 commits into from
Sep 20, 2024

Conversation

slevang
Copy link
Collaborator

@slevang slevang commented Sep 6, 2024

Spherical padding

This makes an initial attempt at building in some automatic logic for better handling the boundaries of a spherical domain with appropriate padding, as well as shifting the longitude source grid to match the target as needed.

Faster tests

Made some modifications and general cleanup of the existing tests. Mainly making the datasets dask-backed so the regridding functions are lazy for all the attribute checks. Also got rid of some unneeded copies, compute calls, etc. Tests run in about 17 seconds now.

With the addition of the spherical padding logic I was also able to remove some aspects of the tests where we avoided checking for good matches at the boundaries.

Conservative NaN tracking

I did some benchmarking and the version of NaN tracking where we track independently across non-grid dimensions as well doesn't actually have much of a run-time penalty. Therefore I went ahead and made this small change. The only issue is that it can induce a very large memory penalty if the input data isn't chunked, because the weights arrays can become huge if they are broadcast with all the other input dimensions. I added a note to the docstring about this but perhaps we want a more formal warning, or to explicitly add some chunking in certain cases?

TODO / open questions

  • tests added
  • fix typing
  • add examples to docstrings
  • add args to make the padding behavior configurable?

src/xarray_regrid/methods/conservative.py Show resolved Hide resolved
src/xarray_regrid/utils.py Show resolved Hide resolved
src/xarray_regrid/utils.py Outdated Show resolved Hide resolved
tests/test_regrid.py Show resolved Hide resolved
tests/test_regrid.py Show resolved Hide resolved
tests/test_regrid.py Outdated Show resolved Hide resolved
src/xarray_regrid/utils.py Show resolved Hide resolved
src/xarray_regrid/utils.py Show resolved Hide resolved
tests/test_format.py Show resolved Hide resolved
tests/test_regrid.py Show resolved Hide resolved
src/xarray_regrid/utils.py Outdated Show resolved Hide resolved
BSchilperoort
BSchilperoort previously approved these changes Sep 17, 2024
Copy link
Contributor

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these improvements!

It would probably be good to add a single workflow runner with micromamba, so that the comparison with ESMF will run on the CI. I could do this in a separate PR.

The only issue is that it can induce a very large memory penalty if the input data isn't chunked, because the weights arrays can become huge if they are broadcast with all the other input dimensions. I added a note to the docstring about this but perhaps we want a more formal warning, or to explicitly add some chunking in certain cases?

This probably is quite unintuitive to users, was the reason for removing this tracking? mostly just simplifying the code? If it does allow for a much lower memory footprint (without users needing to improve their chunking) it could be interesting to leave the non-grid-dims tracking in.

Made some modifications and general cleanup of the existing tests. Mainly making the datasets dask-backed so the regridding functions are lazy for all the attribute checks. Also got rid of some unneeded copies, compute calls, etc. Tests run in about 17 seconds now.

Awesome!

@slevang
Copy link
Collaborator Author

slevang commented Sep 19, 2024

This probably is quite unintuitive to users, was the reason for removing this tracking? mostly just simplifying the code? If it does allow for a much lower memory footprint (without users needing to improve their chunking) it could be interesting to leave the non-grid-dims tracking in.

It does simplify the code. But primarily, getting rid of this aggregation of the valid frac over non grid dims allows us to truly track isolated NaN fractions, so that you can have NaN in just a single time slice for example and get the correct output. I think what I have now is the most sensible baseline option, but I have some other ideas for improving conservative performance so I'll follow up in a different branch to avoid adding too much to this PR.

@slevang
Copy link
Collaborator Author

slevang commented Sep 19, 2024

I made a few small changes/refactors which I guess cancelled out your previous review.

Copy link
Contributor

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements! 😄

@BSchilperoort BSchilperoort linked an issue Sep 20, 2024 that may be closed by this pull request
@slevang slevang merged commit 9d71962 into main Sep 20, 2024
11 checks passed
@BSchilperoort BSchilperoort deleted the boundary-padding branch September 20, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants