Blocking fetching AMOs are unaffected by fence/quiet #518

davidozog · 2024-07-11T20:14:03Z

Summary of changes

This clarifies that blocking fetching AMOs are unaffected by fence/quiet, but blocking non-fetching AMOs are affected by fence/quiet. Previously this was a bit unclear from Table 14 in OpenSHMEM v1.5.

jdinan · 2024-08-22T19:21:58Z

There is actually an unresolved memory model issue with this proposal. Specifically, given a litmus test like this:

void foo(int pe) {
  static int x = 0;
  int y = shmem_atomic_fetch_inc(&x, pe);
  int z = shmem_atomic_fetch_inc(&x, pe);
  assert(z > y);
}

and assuming no overflow, do we require the assertion to be true? Relaxed order atomics (e.g. std::memory_order_relaxed or GPU native atomics) don't guarantee this. So we may want to (or maybe we already did) define fence/quiet as enforcing this ordering for blocking atomics. At the moment, the memory order semantics of OpenSHMEM AMOs remains unspecified.

davidozog · 2024-08-22T20:51:17Z

... do we require the assertion to be true? Relaxed order atomics (e.g. std::memory_order_relaxed or GPU native atomics) don't guarantee this. So we may want to (or maybe we already did) define fence/quiet as enforcing this ordering for blocking atomics. At the moment, the memory order semantics of OpenSHMEM AMOs remains unspecified.

Yeah, great question. I might agree that we do not require the assertion to be true in this example (at least if relaxed atomics are allowed) but this proposal says that a fence here does not fix it regardless. So if we do want a fence to force the ordering, we should reconsider this proposal and/or fix up the memory model first.

I think a related confusion is this sentence in 9.7:

The [fetching AMO] routines return after the data has been fetched from the target PE and delivered to the calling PE

so a naive user might think the 2nd call isn't even initiated until y is set - this feels more like "acquire-release" than "relaxed" semantics to me...

Also, just to clarify/simplify this example, can we assume that there are no other atomics happening on pe at &x during foo? Is there a synchronization with the target pe before foo is called?

Pinging @wrrobin and @stewartl318 who might be interested and will have to cover for me at the ballot (I'll be chairing the HOTI tutorials ;).

jdinan · 2024-08-22T21:30:02Z

Given that this is a minor change that treads into the space of clarifying unspecified behavior, I'm tempted to ask that we table it for OpenSHMEM 1.6 so that we can try to address the issue in the memory model holistically. I suspect that we'll eventually want to define blocking fetching AMOs as ordered (this will be painful on GPUs) and introduce new AMOs that allow users to choose the ordering model (i.e. can choose relaxed for performance). What do you think?

wrrobin · 2024-08-22T22:11:39Z

@jdinan - AMOs with user options for choosing ordering and scope is definitely a good idea. But, even for those, the default option needs to be clarified. Are you suggesting to use fence/quiet for both blocking and non-blocking fetching AMOs to ensure ordering in the default case?

stewartl318 · 2024-08-23T04:10:03Z

I am firmly on the side of having the default behavior be relaxed.

First of all, Jim's example could be implemented by fetch_and_add(2), and then the caller can decide the ordering.

Second, my (dim) understanding of the memory models is that the atomics with ordering (seq_cst, release, acquire, acquire-release) have to do with making sure that data buffers are written before an associated flag (release) and that a flag is checked before a buffer is read (acquire). They only really apply to the sequencing of operations from a particular source. We already have a flag API (put with signal) and that one should be "release". We also have shmem_fence to enforce order of delivery.

My suspicion is that most individual non-fetching AMOs are there to support lock-free modifications of remote buffers, and the lock freedom is against other PEs, hardly ever against yourself. Relaxed is the most performant choice.

For fetching atomics, it seems likely to me that again the use case is almost always contention with other PEs not with yourself, so again, the ordering doesn't matter.

When ordering of atomics from the same pe matters, it will (usually) be to different addresses, when you wish the target to see the updates in a particular order. shmem_fence does that.

I am perfectly comfortable with the idea that shmem doesn't enforce ordering of fetching atomics and I certainly wouldn't want to slow down the common cases in support of such a corner case.

wrrobin · 2024-08-23T13:00:09Z

Thanks @stewartl318 for your thoughts. This sounds good to me as well. But, how do we clarify it in the spec? Should we then update the AMO section 9.7 text as Dave pointed above?
Also, from the users point of view, how would they expect different behavior from blocking and non-blocking versions of fetch AMOs? Looks like, we are in favor of replacing the existing ones with user options to specify the behavior. Let me know if I am missing anything in this regard.

jdinan · 2024-08-23T16:50:41Z

@stewartl318 I think that's a very reasonable proposal. It may break backwards compatibility and we'll need to wrestle with that. Regarding this proposal, my understanding is it's being put forward as a simple bugfix to a table that's believed to be inconsistent. IIUC, the goal is not to clarify the memory model. However, any change we make here will have memory model implications/complications. So my suggestion is that we take some additional time to craft a proper clarification to the memory model and delay this change to the OpenSHMEM 1.7.

blocking fetching AMOs unaffected by fence/quiet

aaac6c1

davidozog added this to the OpenSHMEM 1.6 milestone Jul 11, 2024

davidozog self-assigned this Jul 11, 2024

davidozog added the HadReading label Aug 12, 2024

jdinan added the PendingBallot label Aug 22, 2024

jdinan removed the PendingBallot label Aug 23, 2024

jdinan modified the milestones: OpenSHMEM 1.6, OpenSHMEM 1.7 Aug 23, 2024

davidozog closed this Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocking fetching AMOs are unaffected by fence/quiet #518

Blocking fetching AMOs are unaffected by fence/quiet #518

davidozog commented Jul 11, 2024

jdinan commented Aug 22, 2024

davidozog commented Aug 22, 2024

jdinan commented Aug 22, 2024

wrrobin commented Aug 22, 2024

stewartl318 commented Aug 23, 2024

wrrobin commented Aug 23, 2024

jdinan commented Aug 23, 2024

Blocking fetching AMOs are unaffected by fence/quiet #518

Blocking fetching AMOs are unaffected by fence/quiet #518

Conversation

davidozog commented Jul 11, 2024

Summary of changes

jdinan commented Aug 22, 2024

davidozog commented Aug 22, 2024

jdinan commented Aug 22, 2024

wrrobin commented Aug 22, 2024

stewartl318 commented Aug 23, 2024

wrrobin commented Aug 23, 2024

jdinan commented Aug 23, 2024