Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic coherency test #11

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Atomic coherency test #11

wants to merge 5 commits into from

Conversation

raphlinus
Copy link
Contributor

This overwrites the compute-shader-hello example to be a test of atomic
coherency. My understanding is that with the barriers it should run with
0 failures, even in strategy 0. With strategy 1 (atomicOr) as a
workaround, it seems to be working.

This overwrites the compute-shader-hello example to be a test of atomic
coherency. My understanding is that with the barriers it should run with
0 failures, even in strategy 0. With strategy 1 (atomicOr) as a
workaround, it seems to be working.
@raphlinus
Copy link
Contributor Author

raphlinus commented Oct 29, 2021

A bit of explanation: this is an attempt to make a clean, isolated test for some atomic coherence failures I've been observing. It's written in WGSL and uses WebGPU, but on the system I've been exploring (AMD 5700XT), I am seeing similar failures in a Vulkan-only context.

One advantage to having the test case be in WGSL is that there is one concepts of what the WGSL memory model should be, even if it isn't written down yet. By contrast, in Vulkan-land, the ideas are spread across Vulkan, SPIR-V, and (to a lesser extent) GLSL. Further, there is considerable diversity in how much the memory model is "turned on," in particular whether SPV 1.5 turns on the VulkanMemoryModel capability and whether the corresponding feature is enabled in device creation; the sense I get is that this is fairly rare, and it's far more common to be running in some kind of hybrid compatibility mode.

In any case, it's my feeling that the test in this PR should always pass. It fails on 5700XT. I find that there are two workarounds, which I think sheds some light on the issue.

The first is to use ordinary loads and stores rather than atomics, and to enable the coherent qualifier on the buffer. This is basically what the current piet-gpu code does, as those shaders are written in GLSL with an eye towards compatibility, ie not requiring an explicit Vulkan memory model feature. However, this is not consistent with the current approach for translating WGSL. It's possible that the recommended WGSL translation should change to be more like this, however, for better compatibility. One way of thinking about this is that there's no way (I know of) to get current WGSL translation output from a GLSL, so making it resemble existing GLSL translation output might reduce the number of exciting and fun bugs (such as this one).

The second is to use atomicOr(, 0) instead of atomicLoad for the load of the data word. By my reading, the semantics of that should be identical (wrt the yet-to-be-written WGSL and the actual Vulkan memory models), but the former conveys more of an intent of "actually go to memory rather than cache". I can see doing that to unblock my work in getting a webgpu prefix sum implementation out there.

I've done a number of experiments to validate that this is downstream of wgpu's shader translation (naga). One is to use tint to generate spv. Another is to manually add OpMemberDecorate Coherent to the data buffer (which I now understand is not expected to make any difference for purely atomic operations; it affects the handling of ordinary loads and stores). Finally, I've done an experiment of fully enabling the vulkan memory model (on Vk 1.2 device creation), and adding MakeAvailable+Release / MakeVisible+Acquire memory semantics flags to the relevant atomics (this would not be viable from wgsl sources, as only relaxed semantics are currently spec'ed). None of these make a difference.

I'd like to get some form of this into the webgpu test suite. That should hopefully have the effect of shaking out bugs in implementations.

@raphlinus
Copy link
Contributor Author

raphlinus commented Oct 29, 2021

Another fascinating datapoint: if I fully enable the Vulkan memory model and change the load of the data word to gl_SemanticsAcquire | gl_SemanticsMakeVisible, then it passes. I believe that for correctness acquire/release is only needed on the flag load/store, and the data can be relaxed, but intuitively I can see how these would signal "actually go to memory instead of cache".

Also note, I have to hand-edit spv for this to pass validation. If I simply add these flags and use glslangValidator to compile, it produces output that's missing OpCapability VulkanMemoryModelDeviceScopeKHR. If I add that by hand, it passes validation.

This is probably the version of the test I want to promote.
The test is both simpler and more reliable in catching problems. There's
no looping in the shader, so it's basically just the classic "message
passing" litmus test, just run in parallel and with a simple permutation
(multiplication by primes mod 64k) to mix up memory acccesses.

It also runs 1000 dispatches. This reliably catches thousands of
failures on my 5700XT.
Slightly cleaner setup of shader source.
The previous version of the test had a race between a nonatomic store
and a nonatomic load, because the load was not conditional on the atomic
flag. Adding an "if" guard makes the weak behavior observed on AMD
5700 XT go away. If those memory accesses are changed to atomics, then
it's no longer a race, so the observed reorderings seem to be actual
failures of the GPU relative to the Vulkan memory model.

A typical run has 64M measurements (64k per dispatch, 1000 iterations),
of which about 5000 have a flag = 1 but data = 0.
@raphlinus
Copy link
Contributor Author

I just pushed a new version of the test which uses atomics for both data and flag. The previous version was broken, as there was a race between nonatomic load and store of the data field.

I believe the new version unambiguously demonstrates failures with respect to the Vulkan memory model. Since all memory access is atomic, it doesn't implicate the availability/visibility issues as discussed in gpuweb/gpuweb#2228, and also I believe both the tint and naga translations of this test into SPIR-V are correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant