-
Notifications
You must be signed in to change notification settings - Fork 41
RMA WG 06 20 2019
Agenda:
- Anshuman's get ordering benchmark
- Naveen's recent changes to
shmem_fetch_signal
-
Single element get from every PE (in rounds).
-
No ordering between rounds, fences between rounds in NVSHMEM.
-
Unclear (according to v1.4 spec) if gets can be issued in any order.
-
In NVSHMEM single element get is a weak load (can be issued in any order) - probably not spec conformant.
-
Performance experiment possibly motivates relaxing the ordering requirement.
-
Benchmark has 2 versions: - 1 fence between rounds (~6 us w 4 PEs) - 1 fence per get (~13 us w/ 4 PEs)
-
Increasing PEs exacerbates this affect.
-
Anshuman will try to share the benchmark code, but must check with legal compliance.
-
Q: could a read force ordering within shmem_g?
A: Valid point.
-
Q: So what's difference between blocking and non-blocking get.
A: Nonblocking get has no guarantee of completion after return. With blocking, value can be read from dest buffer after returning.
-
Q: Isnt' this unique to NVSHMEM?
A: No - expect similar behavior on Power9. This affects architectures with load/store based gets.
-
Users kinda expect program order to be maintained.
-
If not using dest buffer, can you reorder? Should be allowable, but compilers can't really do shmem-aware function reordering.
-
Fortran can reorder functions, for example.
-
Does Blocking say anything about preserving program order?
-
Discussed 3 relevant examples:
————————————————————————————
Process 0:
A = shmem_fetch(x, 0)
B = shmem_fetch(x, 0)
C = B
Process 1:
shmem_inc(x, 0)
Possible outcomes:
A = 0, B = 0
A = 0, B = 1
A = 1, B = 1
A = 1, B = 0 (??)
————————————————————————————
Process 0:
Initially: x = 0, y = 0
A = shmem_fetch(x, 0)
B = shmem_fetch(y, 0)
Process 1:
shmem_inc(x, 0)
shmem_fence()
shmem_inc(y, 0)
Possible outcomes:
A = 0, B = 0
A = 0, B = 1 (??)
A = 1, B = 1
A = 1, B = 0
————————————————————————————
Process 0:
Initially: x = 0, y = 0
while (! shmem_fetch(y, 0) ) ;
B = shmem_g(x, 0)
Process 1:
shmem_p(x, 1, 0)
shmem_fence()
shmem_inc(y, 0)
Possible outcomes:
A = 0, B = 0
A = 0, B = 1 (??)
A = 1, B = 1
A = 1, B = 0
- Does OpenSHMEM spec force a read-fence between the fence operations on PE 0?
- Weakly ordered models need fences on both sides.
- Need wait/test to order on consumer side.
- Spec seems underspecified - if so, should we specify? Should gets be issuable in any order?
- We could propose a property on memory fences that are specified on a context.
- shmem_g and shmem_p conflicting isn't well defined.
- Atomic operations must ordered in spec.
-
Working Groups
-
Errata