Performance benchmarks #42

jhamman · 2021-11-18T21:24:53Z

Xbatcher is meant to make it easy to generate batches from Xarray datasets and feed them into machine learning libraries. As we wrote in its roadmap, we are also considering various options to improve batch generation performance. I think it's clear to everyone that naively looping through arbitrary xarray datasets will not be sufficiently performant for most applications (see #37 for examples / discussion). We need tools/models/etc. to handle things like caching, shuffling, and parallel loading and we need a framework to evaluate the performance benefits added features.

Proposal

Before we start optimizing xbatcher, we should develop a framework for evaluating performance benefits. I propose we setup ASV and develop a handful of basic batch generation benchmarks. ASV is used by Xarray and a bunch of other related projects. It allows writing custom benchmarks like this:

example 1:

class HugeAxisSmallSliceIndexing:
    # https://github.com/pydata/xarray/pull/4560
    def setup(self):
        self.filepath = "test_indexing_huge_axis_small_slice.nc"
        if not os.path.isfile(self.filepath):
            xr.Dataset(
                {"a": ("x", np.arange(10_000_000))},
                coords={"x": np.arange(10_000_000)},
            ).to_netcdf(self.filepath, format="NETCDF4")

        self.ds = xr.open_dataset(self.filepath)

    def time_indexing(self):
        self.ds.isel(x=slice(100))

    def cleanup(self):
        self.ds.close()

We could do the same here, but with a focus on batch generation. As we talk about adding performance optimizations, I think this is the only way we begin to evaluate their benefits.

weiji14 · 2022-09-01T18:40:41Z

Is there a way to have a public record of the benchmarks? I'm thinking of something like what https://codecov.io is to pytest-cov. I found airspeed-velocity/asv#796 which is a GitHub Action solution, but was wondering if there's a nicer way to track performance over time with each merged PR on a line chart.

maxrjones · 2022-09-01T18:52:21Z

There's no current public record. I didn't prioritize publishing results because it seemed the lack of dedicated, consistent hardware would be a barrier to useful records. But https://labs.quansight.org/blog/2021/08/github-actions-benchmarks suggests that GitHub actions could be sufficient to identify performance changes >50%.

weiji14 · 2022-09-01T19:12:12Z

That's a really nice blog post, thanks for sharing! The GitHub Actions doesn't look trivial to setup though 😅 I did find https://github.com/benchmark-action/github-action-benchmark but they don't support asv (yet). Maybe we should find a way to piggyback on to https://pandas.pydata.org/speed/xarray?

maxrjones · 2023-02-08T16:27:34Z

After #168 we'll have a pretty good suite of benchmarks.

The following two tasks remain for closing out this issue:

Periodically run benchmarks in CI to identify any issues with the asv setup or performance regressions
Configure asv to compare new Xarray releases, since xbatcher's performance is so tied to Xarray's

weiji14 · 2024-01-01T23:31:06Z

There's no current public record. I didn't prioritize publishing results because it seemed the lack of dedicated, consistent hardware would be a barrier to useful records. But https://labs.quansight.org/blog/2021/08/github-actions-benchmarks suggests that GitHub actions could be sufficient to identify performance changes >50%.

We're starting to experiment with using pytest-codspeed at PyGMT for benchmarking (see GenericMappingTools/pygmt#2910 and GenericMappingTools/pygmt#2908). CodSpeed seems to solve the problem of inconsistency by measuring CPU cycles and memory accesses instead of execution time, but this can be less intuitive in some cases, since more CPU cycles used doesn't always mean slower execution time.

If there's interest, I can help with setting up the CI infrastructure for CodSpeed this year. This would require some refactoring of the current benchmarks from ASV to pytest-benchmark, but this would allows us to track performance benchmarks publicly like CodeCov (see https://codspeed.io/explore), rather than having to compare runs locally. Thoughts anyone?

maxrjones · 2024-01-19T15:41:13Z

There's no current public record. I didn't prioritize publishing results because it seemed the lack of dedicated, consistent hardware would be a barrier to useful records. But https://labs.quansight.org/blog/2021/08/github-actions-benchmarks suggests that GitHub actions could be sufficient to identify performance changes >50%.

We're starting to experiment with using pytest-codspeed at PyGMT for benchmarking (see GenericMappingTools/pygmt#2910 and GenericMappingTools/pygmt#2908). CodSpeed seems to solve the problem of inconsistency by measuring CPU cycles and memory accesses instead of execution time, but this can be less intuitive in some cases, since more CPU cycles used doesn't always mean slower execution time.

If there's interest, I can help with setting up the CI infrastructure for CodSpeed this year. This would require some refactoring of the current benchmarks from ASV to pytest-benchmark, but this would allows us to track performance benchmarks publicly like CodeCov (see https://codspeed.io/explore), rather than having to compare runs locally. Thoughts anyone?

I recently started using CodSpeed for ndpyramid after reading your comment and it seems really neat! I agree that it could work well for xbatcher, since the necessity for locally running the benchmarks is a barrier to use. It's also nice that the same code can be used for tests and benchmarks. Fully support you trying it for xbatcher!

jhamman mentioned this issue Nov 18, 2021

Generating the batches seems slow #37

Open

This was referenced May 11, 2022

Use pangeo_sphinx_book_theme and mock imports #62

Merged

Setup benchmarks #64

Merged

maxrjones mentioned this issue Feb 8, 2023

Improve benchmarks #168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance benchmarks #42

Performance benchmarks #42

jhamman commented Nov 18, 2021

weiji14 commented Sep 1, 2022

maxrjones commented Sep 1, 2022

weiji14 commented Sep 1, 2022

maxrjones commented Feb 8, 2023

weiji14 commented Jan 1, 2024 •

edited

Loading

maxrjones commented Jan 19, 2024

Performance benchmarks #42

Performance benchmarks #42

Comments

jhamman commented Nov 18, 2021

Proposal

weiji14 commented Sep 1, 2022

maxrjones commented Sep 1, 2022

weiji14 commented Sep 1, 2022

maxrjones commented Feb 8, 2023

weiji14 commented Jan 1, 2024 • edited Loading

maxrjones commented Jan 19, 2024

weiji14 commented Jan 1, 2024 •

edited

Loading