Fixes to winding number accumulation #391

raphlinus · 2023-10-24T16:33:54Z

This patch fixes rendering artifacts related to winding number accumulation. The first problem is that 4 bits per pixel is not adequate for all test scenes (the cardioid one in particular exercises this). The second problem is that we didn't have a case for even-odd.

This patch is expected to regress performance for the nonzero fill rule, but improve it for even-odd. Some of the performance loss is decreased occupancy because of a larger shared memory footprint for the accumulation, some is because of increased atomic traffic. We should do some performance work as a followup, first characterizing the performance impact, then doing some mitigation. One promising idea is to specialize further based on the number of segments in the slice, using 4 bit accumulation when we can guarantee that won't overflow.

This will have a (small) merge conflict with #382. If that's landed first, I'll happily update this one.

Fixes #380 and #390.

Changes the winding number accumulation in msaa8 mode to 8 bits per sample. This is prototype code and currently breaks the msaa16 mode; it is intended to diagnose whether the artifacts are strictly due to overflow, and to point the way to a real implementation. Prefix sums in both x and y direction are a little cleaner, avoiding a race (not UB because it's atomics).

This patch makes the msaa16 mode work again, using 8 bit accumulation of winding numbers. It could be merged to fix the artifacts in the cardioid example. Also, it's worth doing some evaluation to see how much performance slowdown there is. As future work, we probably want to be adaptive and use 8 bit accumulation when needed. If the performance hit from reduced occupancy due to the increased shared memory usage is significant, then we could consider other mitigations, including downgrading to msaa8 when overflow is possible.

Create a specialized version of the fill function for the even-odd fill rule. The logic is simpler (and faster) because winding number accumulation can happen in one bit. There's a bunch of code duplication which can be cleaned up. It's expected this will have a merge conflict with #382. If that's merged first, I'll happily fix this one.

armansito

Thank you for the fixes. I think the new approach makes sense but as we discussed today, a lot of the logic is very dense/involved and could use some clarification.

LGTM otherwise.

armansito · 2023-10-26T17:10:56Z

shader/fine.wgsl

    let even_odd = (fill.size_and_rule & 1u) != 0u;
+    if even_odd {


Please document your reasoning around divergence here: that PTCL commands are expected to get processed uniformly.

armansito · 2023-10-26T17:24:31Z

shader/fine.wgsl

-            atomicStore(&sh_winding_y[th_ix], 0x88888888u);
+    if th_ix < 64u {
+        if th_ix < 4u {
+            atomicStore(&sh_winding_y[th_ix], 0x80808080u);


This makes sense to me but maybe briefly document the SWAR packing here for posterity, i.e. this stores 4 8-bit winding numbers.

IWBN to document the counting scheme and representation here (based on our conversation today, this is using a biased offset binary representation but this wasn't obvious to me, as I initially assumed 2's complement and couldn't make sense of the initial values).

armansito · 2023-10-26T22:53:14Z

shader/fine.wgsl

-        let expected_zero = (((packed_w >> (minor * 4u)) + wind_y) & 0xfu) - u32(fill.backdrop);
-        if expected_zero >= 16u {
+        let minor = i; // assumes PIXELS_PER_THREAD == 4
+        let expected_zero = (((packed_w >> (minor * 8u)) + wind_y) & 0xffu) - u32(fill.backdrop);


Please document and explain the logic behind expected_zero.

armansito · 2023-10-26T23:30:39Z

shader/fine.wgsl

-// This is 8 winding numbers packed to a u32, 4 bits per sample
-var<workgroup> sh_winding: array<atomic<u32>, 32u>;
+// This is 4 winding numbers packed to a u32, 8 bits per sample
+var<workgroup> sh_winding: array<atomic<u32>, 64u>;


As we discussed today, please document not only the packing pattern but what each shared buffer semantically represents in the hierarchy of counts.

armansito · 2023-10-26T23:32:35Z

shader/fine.wgsl

 #endif
+#ifdef msaa16


Please try to document your invariants around the SWAR math as best as you can. Perhaps even add a note that refers the reader to the even-odd version as it's easier to understand.

armansito · 2023-10-26T23:36:49Z

src/lib.rs

@@ -73,7 +73,7 @@ enum AaConfig {

 /// Configuration of antialiasing. Currently this is static, but could be switched to
 /// a launch option or even finer-grained.
-const ANTIALIASING: AaConfig = AaConfig::Area;
+const ANTIALIASING: AaConfig = AaConfig::Msaa16;


Do we want to make Msaa16 the default yet? Perhaps this should remain as set to area for now as we may not want to get hit by the performance cost of increased workgroup memory in Msaa16 mode.

That said, I think I would like to make this configurable at runtime. I think a reasonable approach is to introduce two levels of settings: a RendererOptions setting that tells the library which pipeline variants to compile/enable and a RenderParams setting that selects a particular AA mode (which can eventually have a "dynamic" option if we decide to support that). That way we can run the test suite in various modes without having to re-compile the code.

If you agree with that abstraction, I'd be happy to wire that up in a follow-up.

I think it's good making Msaa16 the default at least temporarily, if for no other reason than to exercise it. I also agree that it should be configurable, but the mechanism to do that is nontrivial (thanks for agreeing to take it on!) and seems like it can be done separately from this PR.

I did my best to document some of the strange bit magic used in the algorithm. I also did just a bit of renaming to make things simpler, and for the mask expansion replaced `|` with `^` because it's easier to understand in terms of carry-less multiplication (and I expect performance to be identical).

armansito

Thank you very much for the documentation!

armansito · 2023-10-27T22:46:37Z

shader/fine.wgsl

+var<workgroup> sh_winding: array<atomic<u32>, 64u>;
+// This array contains winding numbers of multiple sample points within
+// a pixel, relative to the winding number of the top left corner of the
+// pixels. The encoding and packing is the same as `sh_winding_y`.


pixels. The encoding...

of the pixel (singular) or is this relative to a group of pixels?

I went back and forth between singular and plural. Fixed.

raphlinus added 6 commits October 17, 2023 15:19

Prepare for merge

39fbbb2

Merge branch 'main' into swar8

b67759e

Merge branch 'main' into swar8

f4119ea

armansito reviewed Oct 26, 2023

View reviewed changes

raphlinus added 2 commits October 27, 2023 14:11

Merge branch 'main' into swar8

303764a

armansito approved these changes Oct 27, 2023

View reviewed changes

Typo

7a80c89

raphlinus merged commit 3fab38a into main Oct 30, 2023
4 checks passed

raphlinus deleted the swar8 branch October 30, 2023 16:13

raphlinus mentioned this pull request Aug 22, 2024

Sparse strip path rendering #670

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes to winding number accumulation #391

Fixes to winding number accumulation #391

raphlinus commented Oct 24, 2023

armansito left a comment

armansito Oct 26, 2023

armansito Oct 26, 2023 •

edited

Loading

armansito Oct 26, 2023

armansito Oct 26, 2023

armansito Oct 26, 2023

armansito Oct 26, 2023

raphlinus Oct 27, 2023

armansito left a comment

armansito Oct 27, 2023

raphlinus Oct 27, 2023

		let even_odd = (fill.size_and_rule & 1u) != 0u;
		if even_odd {

Fixes to winding number accumulation #391

Fixes to winding number accumulation #391

Conversation

raphlinus commented Oct 24, 2023

armansito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armansito Oct 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armansito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armansito Oct 26, 2023 •

edited

Loading