-
Notifications
You must be signed in to change notification settings - Fork 41
RMA WG 03 01 2018
Nick Park edited this page Mar 1, 2018
·
7 revisions
- Collect opens, assign note taker
- Bob C. (Cray): Cray put with signal #206
- Nick (DoD): Sketch of Wait-and-then-reset / SHMEM-IFTTT #208
- Dave O. (Intel): SHMEM Wait/test-all/some/any #207
- Cray: Bob C., Krishna ?., Pat (?)
- Nick P. (DOD)
- Megan G. (ARM)
- John L.
- Swaroop
- ORNL (who?)
- Get feedback/input from users on
shmem_put_signal
w.r.t. the type of the signal word (e.g.,size_t
,uint64_t
, something else?) and the type of the signal operation (e.g., atomic write, atomic add).
- Put with signal (presentation and discussion)
- Put-with-signal can be a lighter-weight operation than put+fence+put idiom, since
shmem_fence()
would fence all preceding operations (on the context). - In Cray's implementation, the signal word is always a
uint64_t
. - Jim: Does the user decide the location of the signal word? Bob: The signal is specified just by the address.
- Pasha: What happens if different initiators overwrite the same flag? Bob: This is a race condition; whoever wins will update the word first. Pasha: Can it be partially updated? Bob: No, the update is atomic.
- Bob: DMAPP only currently supports atomic-store, not atomic-increment for the update word.
- Bob: Supported in hardware under FMA path; under BTE, path is put+quiet+AMO. For large transfers, one loses most of the non-blocking behavior.
- Jim: Put-with-signal does not affect ordering of any other operations.
- Bob: One might have to use an asynchronous progress thread to make the put-with-signal implementation nonblocking without hardware support (e.g., under put+fence+AMO implementation).
- Jim: Next steps?
- Jim: Users get most benefit when hardware support is available for these additional operations. (discussion of overheads)
- Put-with-signal can be a lighter-weight operation than put+fence+put idiom, since
- Discussion of
shmem_wait_then_set
- Manju: Is there an implementation of it? Nick: Just the one-line version. Manju: What about a threaded environment? Are there issues with a thread (or PE) acquiring a lock, then getting descheduled? Nick: Not that I've seen, but I don't typically work in an oversubscribed environment.
- Jim: I've been thinking of how to implement this without a network polling compare-swap loop. One might use a request-response pattern where the target PE is told which condition to look for and notifies the requester once it is met, but updates could come from the network or local memory updates. This seems like hard polling might be necessary.
- Nick: An earlier use case had a three-state "lock".
- Jim: I think one might have to implement this with a compare-swap loop. If that's all this is, then is there enough value to add this to the API?
- Bob: If updates were restricted to SHMEM operations, then a request-response pattern may be possible. Jim: Would that be a limitation? Nick: I don't think so; I just used AMOs in this case for general safety.
- Jim: If wait-then-set operations were only satisfied by other wait-then-set operations, then networks that support destination-side event generation could be used instead of hard polling. This might be analogous to how one implements nonblocking collectives.
-
Working Groups
-
Errata