Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse weights in conservative method #49

Merged
merged 8 commits into from
Sep 24, 2024
Merged

Sparse weights in conservative method #49

merged 8 commits into from
Sep 24, 2024

Conversation

slevang
Copy link
Collaborator

@slevang slevang commented Sep 20, 2024

Potential improvement for #42.

The focus on rectilinear grids for this package, and factorization of regridding along dimensions, makes generating and using dense weights feasible. However, the level of sparsity in the weights matrix is still extremely high for any reasonable size grid. I did some experiments converting the weights to a sparse matrix after creation, and am seeing nice improvements both in compute time and memory footprint.

On the example in #42 (comment) I get close to a 4x speedup (and better than xesmf):

CPU times: user 42.5 s, sys: 6.01 s, total: 48.5 s
Wall time: 11.6 s
CPU times: user 6min 9s, sys: 41.6 s, total: 6min 51s
Wall time: 59.2 s

@BSchilperoort
Copy link
Contributor

BSchilperoort commented Sep 20, 2024

get close to a 4x speedup (and better than xesmf):

Awesome! I did not bother with this originally for the reasons you mentioned. But it's great to see that it's a (relatively easy) way to gain a lot of performance.

Edit:
I don't see any significant performance gain compared to the benchmark I ran from #42...

@slevang
Copy link
Collaborator Author

slevang commented Sep 21, 2024

That was a VM with pretty old CPU architecture and I guess just really slow on these particular calculations. I'm also seeing much faster results on my 8 core M1 Mac, about 12s for the skipna=False on current main. With the sparse weights it drops to 5s though. I'm seeing even bigger improvement with skipna=True I think because the sparsity limits the size of the weight array as we track NaNs over each dim.

Mixed results switching between threaded and distributed schedulers, sometimes a bit faster sometimes slower.

pyproject.toml Outdated Show resolved Hide resolved
@slevang
Copy link
Collaborator Author

slevang commented Sep 21, 2024

I ran the benchmarking test in #42 across several configurations on my 8 core i7 linux desktop. Runtimes to the nearest second:

chunks={"time": 1}, ~4MB

skipna=False threads distributed
sparse 8 14
dense 28 17
xesmf 30 37
skipna=True threads distributed
sparse 67 82
dense 327 335
xesmf 55 71

chunks={"time": 10}, ~40MB

skipna=False threads distributed
sparse 6 7
dense 13 12
xesmf 7 6
skipna=True threads distributed
sparse 59 72
dense OOM OOM
xesmf 10 12

Lots of interesting variation. My takeaways:

  • sparse weights seem to uniformly benefit run time
  • sparse weights make the NaN tracking scheme over dimensions feasible, otherwise for larger chunks the matrix size blows up
  • the only case where the distributed scheduler won was with dense weights and small chunks. Keep in mind this is just the defaults so 2 workers and 4 threads on my machine.
  • with sparse weights, we're on par or better with xesmf, except skipna=True, where xesmf's scheme of simultaneously computing the NaN fraction over all dimensions is much more efficient. This gets washed out with small chunk sizes but is more apparent for larger ones.

BSchilperoort
BSchilperoort previously approved these changes Sep 24, 2024
Copy link
Contributor

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the benchmarks and optimizations! Feel free to merge once you've updated the changelog 🚀

@slevang slevang merged commit bc7be5b into main Sep 24, 2024
11 checks passed
@BSchilperoort BSchilperoort deleted the sparse-weights branch September 24, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants