Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast Blur #2302

Merged
merged 1 commit into from
Oct 11, 2024
Merged

Fast Blur #2302

merged 1 commit into from
Oct 11, 2024

Conversation

torfmaster
Copy link
Contributor

@torfmaster torfmaster commented Aug 2, 2024

PR

This is an attempt to solve #986 as the current solution (using
fastblur) is fragile and requires the consumers to access raw buffers
of the image which is undesirable.
The implementation is inspired by
this article
which is in turn based on Kovesi, P.: Fast Almost-Gaussian Filtering The Australian Pattern
Recognition Society Conference: DICTA 2010. December 2010. Sydney.

The short story is that the real gaussian blur takes about 30 second for
real-world photos which is undesirable. The algorithm here is quick and
has pretty good results.

Questions

Is this PR ready?

No, but I want to have some feedback on the implementation so I can move on (in the right direction).

Are you aware of #2238?

Yes, but I don't know how to deal with it. blur is not yet deprecated and
users of the library will search for a fast alternative in this crate,
not in imageproc. I am open to move my code there now or later.
The advantage of having it here would be that I do not suffer from
the non_exhaustive of DynamicImage and can implement blur
for this type (if desired).

Why didn't you simply c&p the code from the blurslice crate (or the article)?

The blurslice crate is distributed under the MIT license and
re-licencing the code under MIT/Apache2 would require the approval
of all contributors of the crate.
On the other hand it doesn't integrate well with the principal
structures DynamicImage and ImageBuffer. That's why I
decided to re-implement the algorithm. I have to admit here are some
details in both implementations that I don't understand and where
I couldn't find an explanation. Hence I just did my own implementation
that I think I understand.

What about the performance?

I didn't check the performance (and didn't optimize it).
But opposed to the blur implementation the blurring finishes
in a reasonable amount of time.
As soon as the interface is fixed (i.e. whether to accept GenericImageView or
DynamicImage) I can start to performance tweaks.

What about tests?

I can't really imagine how to represent functional tests. However, I plan to

  • write benchmarks as soon I dig into performance tuning
  • write trivial tests (sigma=1 is a no op)
  • write kani tests that make sure that the potentially panicking slice access doesn't panic

Is the interface fixed?

Currently, I accept an ImageBuffer simply because it lets me operate on slices rather
than using another interface to access pixels. However, it might make sense
to also accept an DynamicImage (can be easily accomplished) or a
GenericImageView (requires a complete rewrite of the algorithm).

Will the "old" blur stay?

The fast blur algorithm is inaccurate but for enough for many applications. I would recommend
that the "real" blur stays. It is not much code anyways as it is implemented general convolution
with gaussian kernel.

License

I license past and future contributions under the dual MIT/Apache-2.0 license,
allowing licensees to choose either at their option.

Todos

  • add reference to fast_blur in docs for blur
  • implement fast_blur for DynamicImage
  • work on clippy failures
  • fix public_private_dependencies job
  • write tests for index handling (kani or quickcheck)
  • write benchmarks and optimize performance
  • rebase interactively to create atomic commits

@ripytide
Copy link
Member

ripytide commented Aug 2, 2024

Wow great work!

Are you aware of #2238?
Yes, but I don't know how to deal with it. blur is not yet deprecated and
users of the library will search for a fast alternative in this crate,
not in imageproc. I am open to move my code there now or later.
The advantage of having it here would be that I do not suffer from
the non_exhaustive of DynamicImage and can implement blur
for this type (if desired).

Long-term I think this function should belong in imageproc since it is a image processing operation, for short-term, since a lot of imageops are yet to be deprecated it could be added temporarily in image and then copied across later but that seems like extra work to me.

Maybe we could add a note in the docs of the "slow" blur method in this crate that points to this faster method in the imageproc crate?

Regarding DynamicImage you can use the dynamic_map macro, (not sure if it's currently exported from the crate but it's easy enough to redefine yourself):

use image::{DynamicImage, ImageBuffer, Pixel};

macro_rules! dynamic_map(
        ($dynimage: expr, $image: pat => $action: expr) => ({
            use DynamicImage::*;
            match $dynimage {
                ImageLuma8($image) => ImageLuma8($action),
                ImageLumaA8($image) => ImageLumaA8($action),
                ImageRgb8($image) => ImageRgb8($action),
                ImageRgba8($image) => ImageRgba8($action),
                ImageLuma16($image) => ImageLuma16($action),
                ImageLumaA16($image) => ImageLumaA16($action),
                ImageRgb16($image) => ImageRgb16($action),
                ImageRgba16($image) => ImageRgba16($action),
                ImageRgb32F($image) => ImageRgb32F($action),
                ImageRgba32F($image) => ImageRgba32F($action),
                _ => unreachable!(),
            }
        });

        ($dynimage: expr, $image:pat_param, $action: expr) => (
            match $dynimage {
                DynamicImage::ImageLuma8($image) => $action,
                DynamicImage::ImageLumaA8($image) => $action,
                DynamicImage::ImageRgb8($image) => $action,
                DynamicImage::ImageRgba8($image) => $action,
                DynamicImage::ImageLuma16($image) => $action,
                DynamicImage::ImageLumaA16($image) => $action,
                DynamicImage::ImageRgb16($image) => $action,
                DynamicImage::ImageRgba16($image) => $action,
                DynamicImage::ImageRgb32F($image) => $action,
                DynamicImage::ImageRgba32F($image) => $action,
                _ => unreachable!(),
            }
        );
);

fn main() {
    let image1 = DynamicImage::new_luma8(3, 3);
    let image2 = DynamicImage::new_rgb8(3, 3);

    let image1 = dynamic_map!(image1, i => none(i));
    let image2 = dynamic_map!(image2, i => none(i));
}

fn none<P: Pixel, Container>(
    _image: ImageBuffer<P, Vec<Container>>,
) -> ImageBuffer<P, Vec<Container>> {
    todo!()
}

@torfmaster
Copy link
Contributor Author

Thank you for your comments. I added tasks for your (and my points) so we can track progress more easily.

@torfmaster torfmaster force-pushed the feature/fast-blur branch 7 times, most recently from 7ac2d69 to 2339f81 Compare August 3, 2024 13:06
@torfmaster
Copy link
Contributor Author

I have pushed a rather coarse fix for the private dependency checks. However, I don't know how to deal with these checks. I found no easy way of systematically fix this. The main problem is, that I can't use the public dependency property in the regular Cargo.toml as the feature is unstable and can't be used in stable Rust. Hence, to use this feature we would have to

  • add the cargo feature in the github action (as it is currently done) using a shell script
  • specify the rayon crate as public using a shell script (very ugly)

To make this PR mergeable I hacked an "allow" for this llint to buffer_par knowing that clippy won't complain about allowing an unknown lint. However, this solution is likely to fail in the future.

@torfmaster
Copy link
Contributor Author

I did some initial benchmarking: on my intel i9 machine the bench (1024x768 rgb) takes 680ms with the true gaussian blur whereas fast_blur takes 57 ms. I checked in both benchmarks.
Next I will try to find some bottlenecks in the implementation.

@etemesi254
Copy link

etemesi254 commented Aug 4, 2024

Hi, some ways to improve noted in my comments on zune-imageprocs (author of the library)

https://github.com/etemesi254/zune-image/blob/734bdc7bfcd58860b0a92dde0a98055a1f94b422/crates/zune-imageprocs/src/box_blur.rs#L197-L220

Never did the loop unrolling to save ilp, but others were implemented, really nice properties like no bounds check in inner loop, may be worth investigating (see https://godbolt.org/z/zexPEd976)

See https://fgiesen.wordpress.com/2012/07/30/fast-blurs-1/ and https://fgiesen.wordpress.com/2012/08/01/fast-blurs-2/ for the theories behind it

@torfmaster
Copy link
Contributor Author

I invested some time into performance optimization:

  • simplify leftmost value by r * f(0)
  • get rid of bound checks by asserting the length of the samples
  • get rid of conditionals checking for lower and upper bounds

None of these measures produced any results. I am leaving the code as it is for now (if no one comes up with another idea, of course).

@torfmaster
Copy link
Contributor Author

torfmaster commented Aug 4, 2024

Furthermore, I see no need for kani/quickcheck tests. First the bounds are obviously correct. On the other hand there is no point in using unsafe so securing the code using kani or quickcheck is unnecessary.
Regarding the other tests I added tests for edge cases of sigma. If you have more ideas regarding reasonable tests I am happy to implement them.
I feel the PR is close to "ready for merge" (apart from squashing and reordering the commits) and your comments, of course.

@torfmaster
Copy link
Contributor Author

@etemesi254:
I now implemented the performance improvements I am able to understand (that is I used transposing to improve usage of CPU caches). The algorithmic tweaks were already in place, though. I total I could speed up the blurring by 15 per cent. I feel I am not able to understand the more complicated improvements as I am lacking deeper assembly language knowledge.

@etemesi254
Copy link

etemesi254 commented Aug 5, 2024

Nice.

FYI these were the benchmark results for gaussian blur for big images before this commit . (image-rs, libvips,zune-image)

https://etemesi254.github.io/assets/criterion/imageprocs_%20gaussian%20blur/report/index.html

@torfmaster torfmaster force-pushed the feature/fast-blur branch 2 times, most recently from fc50c64 to cdd617d Compare August 5, 2024 20:29
@torfmaster
Copy link
Contributor Author

My work here is done for now and I would be happy about a code review

@torfmaster
Copy link
Contributor Author

@ripytide sorry to bother you (but you already commented on the issue): Do you know what would be the next steps to merge this PR (and get it released)?
Thank you!

@ripytide
Copy link
Member

You'll need one of the maintainers signoff, of which I am not one.

Also, I don't think this function belongs in this library and should instead be in the imageproc library.

@torfmaster
Copy link
Contributor Author

Thank you for your feedback. I created a PR in imageproc as well (image-rs/imageproc#683).

Copy link
Member

@HeroicKatora HeroicKatora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review time :)

src/imageops/fast_blur.rs Outdated Show resolved Hide resolved
src/imageops/fast_blur.rs Outdated Show resolved Hide resolved
examples/fast_blur/lenna.png Outdated Show resolved Hide resolved
src/imageops/fast_blur.rs Outdated Show resolved Hide resolved
src/imageops/mod.rs Outdated Show resolved Hide resolved
src/imageops/mod.rs Show resolved Hide resolved
@torfmaster torfmaster force-pushed the feature/fast-blur branch 3 times, most recently from 25902b9 to 4737c29 Compare September 4, 2024 12:35
@torfmaster
Copy link
Contributor Author

I just wanted to give a short status report (I also cross post this comment to image-rs/imageproc#683)

  • I added a functional test showing (for an example) that the effect of fast_blur is close to the Gaussian blur implementation in this library
  • I added a benchmark that shows that fast blur is much faster than the current blur implementation
  • I think I addressed all code review comments

What would be the next steps?

@torfmaster
Copy link
Contributor Author

@HeroicKatora sorry for pinging you directly but you already started a review. Is there anything I can do to support the review of this PR? Thank you a lot!

src/buffer_par.rs Outdated Show resolved Hide resolved
@Shnatsel
Copy link
Contributor

Shnatsel commented Sep 12, 2024

I've written the following example code for blur - feel free to include it in the PR under examples/:

use std::env;
use std::error::Error;
use std::path::Path;
use std::process;

fn main() -> Result<(), Box<dyn Error>> {
    // Collect command-line arguments
    let args: Vec<String> = env::args().collect();

    // Ensure that we have 2 CLI arguments
    if args.len() != 3 {
        eprintln!("Usage: {} <path> <radius>", args[0]);
        process::exit(1);
    }

    let radius: f32 = args[2].parse()?;

    // Load the input image
    let path_str = &args[1];
    let path = Path::new(path_str);
    let input = image::open(path)?;

    // Correct but slow Gaussian blur
    let mut out_path_gauss = path.to_owned();
    out_path_gauss.set_extension("gaussian.png");

    let correct_but_slow = input.blur(radius);
    correct_but_slow.save(out_path_gauss)?;

    // Approximate but fast blur
    let mut out_path_fast = path.to_owned();
    out_path_fast.set_extension("fast.png");

    let approximate_but_fast = input.fast_blur(radius);
    approximate_but_fast.save(out_path_fast)?;

    Ok(())
}

With this test image both blur implementation show artifacts:

blur-test-input

It's just a white rectangle in the middle, surrounded by fully transparent pixels.

Both Gaussian blur in image and the fast blur implementation bleed color from fully transparent pixels into the white rectangle, resulting in the following image:

blur-test-input fast

The fully transparent pixels have R set to 255 while all the other channels are set to 0, and that red color bleeds into the white rectangle.

The expected blur as produced by GIMP is like this - blurred white rectangle without any red color:

blurred-gimp-20px

To avoid such artifacts, the pixel's contribution to the color change should be weighted by its alpha channel value. Fully transparent pixels should get multiplied by 0 and not contribute to the changes in non-alpha channels at all.

But since both blur implementations are incorrect, I'll leave deciding how to handle this up to the maintainers.

Copy link
Member

@HeroicKatora HeroicKatora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to keep the incorrect for now, given that they both are. I think we should bunch it with the known color-space issues, afterall it's not clear whether to treat channels as pre-multiplied or not. Given that ambiguity, this just ignores the alpha channel (edit: or assumes pre-multiplied in contrast to the other parts of the library). I think it'd be best to introduces separate alpha-aware version of both pre-multiplied and non-multiplied channel interpretations at a later point and to keep it consistent for now.

@HeroicKatora HeroicKatora merged commit d4a0d26 into image-rs:main Oct 11, 2024
30 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants