Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison with xESMF #48

Open
maresb opened this issue Sep 12, 2024 · 7 comments
Open

Comparison with xESMF #48

maresb opened this issue Sep 12, 2024 · 7 comments
Labels
documentation Improvements or additions to documentation

Comments

@maresb
Copy link

maresb commented Sep 12, 2024

Hey, this popped up on my GitHub feed and it looks interesting.

I'm already using xESMF which seems to have been around for much longer. I'm wondering:

  1. Is there any reason for me to prefer xarray-regrid to xESMF?
  2. If so, how can I migrate my xESMF code to xarray-regrid?

Generalizing my personal request to an actionable feature request, it would be helpful if the docs compared xarray-regrid with existing regridders.

Thanks so much for publishing this project!

@BSchilperoort
Copy link
Contributor

Hi Ben,

Thanks for the feedback!

If xESMF works for you there is no reason to move over. However;

  • if you are running into memory issues (the transformation matrix has to fit in memory completely)
  • are interested in running your code on other platforms (where ESMF is not available, or can be cumbersome for others to install)
  • or xESMF is a bit too slow

then this package could be for you. Note that your regridding has to be from rectilinear -> rectilinear, and not between different Coordinate Reference Systems.


Notebooks comparing xarray-regrid to CDO and xESMF are available on the docs. For example: https://xarray-regrid.readthedocs.io/en/latest/notebooks/benchmarks/benchmarking_conservative.html
That should show you how the xESMF and xarray-regrid methods differ (they're quite similar).

If you do try xarray-regrid on your workflow it would be great to hear how it runs better/worse than xESMF (CPU time as well as memory use).

@BSchilperoort BSchilperoort added the documentation Improvements or additions to documentation label Sep 12, 2024
@maresb
Copy link
Author

maresb commented Sep 12, 2024

Thanks so much @BSchilperoort for the prompt response!!!

Your points are indeed pretty compelling. I'm not sure exactly when, but I'll probably give this a try at some point, and I'll make sure to report back. Thanks again!

@slevang
Copy link
Collaborator

slevang commented Sep 26, 2024

Adding my 2C on advantages this package offers:

  1. xesmf always has to generate a large sparse array of weights in serial, which scales like the number of grid points, and is a killer for small jobs. 30s to generate weights on a 1/4deg grid, only to regrid a small array in a few ms is a bummer. Since xarray-regrid limits to rectilinear grids where we can separate each dimension, this step usually feels near instantaneous across the different methods.
  2. Packaging for xesmf has gotten better but is still a hassle due to the ESMF dependency. You need conda
  3. Everything here is built from the modern pangeo stack so is easily modifiable and extensible
  4. Even ignoring the weight generation bottleneck for small data, performance on large and/or chunked datasets ranges from on par to 10x+ faster across different benchmarks after recent enhancements.

The obvious limitation is non-rectilinear grids, where the flexibility of ESMF is hard to beat.

@slevang
Copy link
Collaborator

slevang commented Sep 26, 2024

An nice example for point 1: trying to regrid a large fixed land surface dataset. Here's the 30 arc second ETOPO geoid, which is 21600x43200:

import xarray as xr
import xarray_regrid

ds = xr.open_dataset(
    "https://www.ngdc.noaa.gov/thredds/dodsC/global/ETOPO2022/30s/30s_geoid_netcdf/ETOPO_2022_v1_30s_N90W180_geoid.nc",
    chunks={},
)

ds = ds.rename(lon="longitude", lat="latitude").drop_vars("crs")

bounds = dict(south=-90, north=90, west=-180, east=180)

target = xarray_regrid.Grid(
    resolution_lat=1,
    resolution_lon=1,
    **bounds,
).create_regridding_dataset()

%timeit ds.regrid.conservative(target);
257 ms ± 2.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This is basically an intractable problem for xesmf. I tried using their chunked parallel weight generation scheme and it still ran for 20 minutes then crashed.

@dcherian
Copy link

To be clear, this benchmarks the weight generation and graph creation, correct? Does it compute smoothly too?

@slevang
Copy link
Collaborator

slevang commented Sep 26, 2024

Then I have to actually download the file 😆 . But yes I'll try that

@slevang
Copy link
Collaborator

slevang commented Sep 26, 2024

ETA 1hr, NCEI server having a bad day I guess. I used xr.ones_like to shortcut.

With -1 chunks, regridding takes about 9s and uses ~15GB of memory. With 1000x1000 chunks, 4s and ~2GB of memory. Pretty good since the data itself is 3.5GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants