Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype For Fast Bulk Sampling Of Pixels #5102

Merged
merged 20 commits into from
Jun 21, 2023

Conversation

danieldresser-ie
Copy link
Contributor

I was doing some experimenting with some of our more expensive image operations, and found that Sampler.sample() is a big bottleneck. In cases where we need to sample every pixel in a region, we can do much better by doing the iteration within Sampler, where we can iterate through every pixel in the region within a particular tile with a single call to cachedData.

I was a bit unsure how to demonstrate the benefit from this ... Resample with a non-separable filter should benefit because it needs to do an enormous amount of sampling ... but the only case we have currently of a non-separable filter is a filter which is incredibly slow to evaluate, so it doesn't really demonstrate the benefit.

I've ended up demonstrating this on RankFilter - this is a little silly, because RankFilter doesn't need to do all this repeated sampling of the same pixels, but for our current naive implementation, it does show a noticeable speedup, 4X for Erode/Dilate, and 2X for Median.

A more practical demonstration of benefit would be optimizing Blur, but that feels a little bit more contentious, because doing that efficiently would probably require adding a special case to Resample for when the matrix is an identity and the filter weights are identical for every pixel ... I think this makes a lot of sense, but perhaps should be a separate PR.

I think this is looking pretty good, though I wish I had slightly more thorough tests for the rank filters in order to validate this sort of work, and ideally I would double check that visitPixels is fully getting inlined and optimized away properly - the goal is that copying a scanline into a vector should be optimized to basically a memcpy per tile.

Copy link
Member

@johnhaddon johnhaddon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel, I like it! Few rushed comments inline - trying to get it posted before our meeting which started 2 minutes ago :)

include/GafferImage/Sampler.inl Outdated Show resolved Hide resolved
@@ -95,6 +95,13 @@ class GAFFERIMAGE_API Sampler
/// 0.5, 0.5.
float sample( float x, float y );

/// Call a functor for all pixels in the region.
/// The signature of the functor must be `F( float value, int x, int y )`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you have any objection to F( const V2i &p, float value )?

include/GafferImage/Sampler.inl Outdated Show resolved Hide resolved
include/GafferImage/Sampler.inl Outdated Show resolved Hide resolved
src/GafferImage/RankFilter.cpp Outdated Show resolved Hide resolved
@danieldresser-ie danieldresser-ie force-pushed the visitPixels branch 2 times, most recently from 14a0392 to 13f4ea0 Compare May 16, 2023 07:14
@danieldresser-ie danieldresser-ie marked this pull request as ready for review May 16, 2023 07:15
@danieldresser-ie
Copy link
Contributor Author

I guess there's enough similarity to the previous version of this that there's no need for a separate PR, even though just about everything has been rewritten.

visitPixels does now guarantee a scanline order. Bugs are fixed, and it's being used more broadly ( along with a few other miscellany perf improvements, like "bilinear" mode for Warp, which just calls sampler.sample() ... allowing us to test the perf of that, and giving a faster option for Warp when we're not picky about quality ).

Trying to look at all the performance impacts this has is a lot of data ... I threw together some quick and dirty scripts in the final commit to help produce a graph ... it's not pretty, and is a little untested ( ImageEngine doesn't currently have matplotlib installed, so I just used some random online site to execute the last part ), but it ends up with a graph something like this, for the last 11 commits of this PR:
visitPerf

I collected a whole bunch of data, and basically kept anything that is plausibly interesting ... including a couple of things that I think are just noise, but I didn't want to cherry-pick and risk confirmation bias. I think this pretty clearly indicates the benefit of all these changes.

The one actual regression is a 15% performance loss in testSimplestUpscalePerf, which uses a box filter to upscale so that every output pixel visits a region that is just a single pixel in the input ( meaning you get the overhead of setting up visitPixels, but don't get the benefit of visiting more than one pixel ). This is a pretty rare case, and probably not that concerning.

Copy link
Member

@johnhaddon johnhaddon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! I've made a few very minor comments inline, but this is looking great, and the results speak for themselves.

Note to self : you will need to update Changes.md after merging.

python/GafferImageTest/DilateTest.py Show resolved Hide resolved
python/GafferImageUI/WarpUI.py Outdated Show resolved Hide resolved
src/GafferImage/Warp.cpp Outdated Show resolved Hide resolved
src/GafferImage/Warp.cpp Outdated Show resolved Hide resolved
include/GafferImage/Sampler.h Outdated Show resolved Hide resolved
include/GafferImage/Sampler.inl Outdated Show resolved Hide resolved
include/GafferImage/Sampler.inl Outdated Show resolved Hide resolved
include/GafferImage/Sampler.inl Outdated Show resolved Hide resolved
include/GafferImage/Sampler.inl Show resolved Hide resolved
This is a very low quality, but fast approach ( it also offer a way of profiling the performance of Sampler::sample( float, float ), which previously wasn't used anywhere critical )
Might as well make it easy for people who want to do fast things with images?
Seems to give a 15% performance improvement in common cases
This is a 70% performance improvement when using a VectorWarp with useDerivatives on to downsample an image.
@danieldresser-ie
Copy link
Contributor Author

OK, I think that's all feedback addressed. I've squashed it in, because it's a lot of small changes to different commits. The only new commit is: "ImageTransformTest : Pre-cache input for performance tests". I also reran the perf tests just in case, and made a graph that is no less ugly, but slightly more informative:
visitPerf2

@johnhaddon johnhaddon merged commit 875c795 into GafferHQ:main Jun 21, 2023
3 of 4 checks passed
johnhaddon added a commit that referenced this pull request Jun 21, 2023
@johnhaddon
Copy link
Member

Thanks Daniel! While updating Changes.md, I tested with a large-radius Blur and was very pleased to see that it was nearly 6x faster, and feels waay speedier in interactive use. Nice one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants