Atomic coherency test #11

raphlinus · 2021-10-29T00:18:28Z

This overwrites the compute-shader-hello example to be a test of atomic
coherency. My understanding is that with the barriers it should run with
0 failures, even in strategy 0. With strategy 1 (atomicOr) as a
workaround, it seems to be working.

This overwrites the compute-shader-hello example to be a test of atomic coherency. My understanding is that with the barriers it should run with 0 failures, even in strategy 0. With strategy 1 (atomicOr) as a workaround, it seems to be working.

raphlinus · 2021-10-29T04:18:37Z

A bit of explanation: this is an attempt to make a clean, isolated test for some atomic coherence failures I've been observing. It's written in WGSL and uses WebGPU, but on the system I've been exploring (AMD 5700XT), I am seeing similar failures in a Vulkan-only context.

One advantage to having the test case be in WGSL is that there is one concepts of what the WGSL memory model should be, even if it isn't written down yet. By contrast, in Vulkan-land, the ideas are spread across Vulkan, SPIR-V, and (to a lesser extent) GLSL. Further, there is considerable diversity in how much the memory model is "turned on," in particular whether SPV 1.5 turns on the VulkanMemoryModel capability and whether the corresponding feature is enabled in device creation; the sense I get is that this is fairly rare, and it's far more common to be running in some kind of hybrid compatibility mode.

In any case, it's my feeling that the test in this PR should always pass. It fails on 5700XT. I find that there are two workarounds, which I think sheds some light on the issue.

The first is to use ordinary loads and stores rather than atomics, and to enable the coherent qualifier on the buffer. This is basically what the current piet-gpu code does, as those shaders are written in GLSL with an eye towards compatibility, ie not requiring an explicit Vulkan memory model feature. However, this is not consistent with the current approach for translating WGSL. It's possible that the recommended WGSL translation should change to be more like this, however, for better compatibility. One way of thinking about this is that there's no way (I know of) to get current WGSL translation output from a GLSL, so making it resemble existing GLSL translation output might reduce the number of exciting and fun bugs (such as this one).

The second is to use atomicOr(, 0) instead of atomicLoad for the load of the data word. By my reading, the semantics of that should be identical (wrt the yet-to-be-written WGSL and the actual Vulkan memory models), but the former conveys more of an intent of "actually go to memory rather than cache". I can see doing that to unblock my work in getting a webgpu prefix sum implementation out there.

I've done a number of experiments to validate that this is downstream of wgpu's shader translation (naga). One is to use tint to generate spv. Another is to manually add OpMemberDecorate Coherent to the data buffer (which I now understand is not expected to make any difference for purely atomic operations; it affects the handling of ordinary loads and stores). Finally, I've done an experiment of fully enabling the vulkan memory model (on Vk 1.2 device creation), and adding MakeAvailable+Release / MakeVisible+Acquire memory semantics flags to the relevant atomics (this would not be viable from wgsl sources, as only relaxed semantics are currently spec'ed). None of these make a difference.

I'd like to get some form of this into the webgpu test suite. That should hopefully have the effect of shaking out bugs in implementations.

raphlinus · 2021-10-29T04:51:58Z

Another fascinating datapoint: if I fully enable the Vulkan memory model and change the load of the data word to gl_SemanticsAcquire | gl_SemanticsMakeVisible, then it passes. I believe that for correctness acquire/release is only needed on the flag load/store, and the data can be relaxed, but intuitively I can see how these would signal "actually go to memory instead of cache".

Also note, I have to hand-edit spv for this to pass validation. If I simply add these flags and use glslangValidator to compile, it produces output that's missing OpCapability VulkanMemoryModelDeviceScopeKHR. If I add that by hand, it passes validation.

This is probably the version of the test I want to promote.

The test is both simpler and more reliable in catching problems. There's no looping in the shader, so it's basically just the classic "message passing" litmus test, just run in parallel and with a simple permutation (multiplication by primes mod 64k) to mix up memory acccesses. It also runs 1000 dispatches. This reliably catches thousands of failures on my 5700XT.

Slightly cleaner setup of shader source.

The previous version of the test had a race between a nonatomic store and a nonatomic load, because the load was not conditional on the atomic flag. Adding an "if" guard makes the weak behavior observed on AMD 5700 XT go away. If those memory accesses are changed to atomics, then it's no longer a race, so the observed reorderings seem to be actual failures of the GPU relative to the Vulkan memory model. A typical run has 64M measurements (64k per dispatch, 1000 iterations), of which about 5000 have a flag = 1 but data = 0.

raphlinus · 2021-11-03T22:55:21Z

I just pushed a new version of the test which uses atomics for both data and flag. The previous version was broken, as there was a race between nonatomic load and store of the data field.

I believe the new version unambiguously demonstrates failures with respect to the Vulkan memory model. Since all memory access is atomic, it doesn't implicate the availability/visibility issues as discussed in gpuweb/gpuweb#2228, and also I believe both the tint and naga translations of this test into SPIR-V are correct.

Atomic coherency test

0594d07

This overwrites the compute-shader-hello example to be a test of atomic coherency. My understanding is that with the barriers it should run with 0 failures, even in strategy 0. With strategy 1 (atomicOr) as a workaround, it seems to be working.

raphlinus force-pushed the atomic_test branch from a70dd7f to 0594d07 Compare October 29, 2021 01:11

raphlinus mentioned this pull request Oct 29, 2021

atomic load/store is not coherent gfx-rs/naga#1488

Closed

Use mixed atomics / non-atomics

fc7dc5d

This is probably the version of the test I want to promote.

raphlinus mentioned this pull request Oct 30, 2021

Run more tests in parallel reeselevine/webgpu-litmus#35

Closed

raphlinus force-pushed the atomic_test branch from 8602338 to fc7dc5d Compare October 30, 2021 04:38

raphlinus mentioned this pull request Oct 30, 2021

Atomic concerns gpuweb/gpuweb#2229

Closed

raphlinus added 2 commits October 31, 2021 08:51

Use make_spirv

629effe

Slightly cleaner setup of shader source.

raphlinus mentioned this pull request Nov 3, 2021

Request for permuted parallel atomic message passing test KhronosGroup/VK-GL-CTS#295

Closed

reeselevine mentioned this pull request Nov 4, 2021

Parallel explore reeselevine/webgpu-litmus#38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic coherency test #11

Atomic coherency test #11

raphlinus commented Oct 29, 2021

raphlinus commented Oct 29, 2021 •

edited

Loading

raphlinus commented Oct 29, 2021 •

edited

Loading

raphlinus commented Nov 3, 2021

Atomic coherency test #11

Are you sure you want to change the base?

Atomic coherency test #11

Conversation

raphlinus commented Oct 29, 2021

raphlinus commented Oct 29, 2021 • edited Loading

raphlinus commented Oct 29, 2021 • edited Loading

raphlinus commented Nov 3, 2021

raphlinus commented Oct 29, 2021 •

edited

Loading

raphlinus commented Oct 29, 2021 •

edited

Loading