Skip to content

RMA WG 06 20 2019

David Ozog edited this page Jun 20, 2019 · 1 revision

Agenda:

  1. Anshuman's get ordering benchmark
  2. Naveen's recent changes to shmem_fetch_signal

Anshuman wrote a 'get ordering' benchmark:

  • Single element get from every PE (in rounds).

  • No ordering between rounds, fences between rounds in NVSHMEM.

  • Unclear (according to v1.4 spec) if gets can be issued in any order.

  • In NVSHMEM single element get is a weak load (can be issued in any order) - probably not spec conformant.

  • Performance experiment possibly motivates relaxing the ordering requirement.

  • Benchmark has 2 versions: - 1 fence between rounds (~6 us w 4 PEs) - 1 fence per get (~13 us w/ 4 PEs)

  • Increasing PEs exacerbates this affect.

  • Anshuman will try to share the benchmark code, but must check with legal compliance.

  • Q: could a read force ordering within shmem_g?

    A: Valid point.

  • Q: So what's difference between blocking and non-blocking get.

    A: Nonblocking get has no guarantee of completion after return. With blocking, value can be read from dest buffer after returning.

  • Q: Isnt' this unique to NVSHMEM?

    A: No - expect similar behavior on Power9. This affects architectures with load/store based gets.

  • Users kinda expect program order to be maintained.

  • If not using dest buffer, can you reorder? Should be allowable, but compilers can't really do shmem-aware function reordering.

  • Fortran can reorder functions, for example.

  • Does Blocking say anything about preserving program order?

  • Discussed 3 relevant examples:

————————————————————————————

Process 0:
 
A = shmem_fetch(x, 0)
B = shmem_fetch(x, 0)
 
C = B
 
Process 1:
 
shmem_inc(x, 0)
 
Possible outcomes:
 
A = 0, B = 0
A = 0, B = 1
A = 1, B = 1
 
A = 1, B = 0 (??)
 
————————————————————————————
 
Process 0:
 
Initially: x = 0, y = 0
 
A = shmem_fetch(x, 0)
B = shmem_fetch(y, 0)
 
Process 1:
 
shmem_inc(x, 0)
shmem_fence()
shmem_inc(y, 0)
 
Possible outcomes:
 
A = 0, B = 0
A = 0, B = 1 (??)
A = 1, B = 1
A = 1, B = 0
 
————————————————————————————
 
Process 0:
 
Initially: x = 0, y = 0
 
while (! shmem_fetch(y, 0) ) ;
B = shmem_g(x, 0)
 
Process 1:
 
shmem_p(x, 1, 0)
shmem_fence()
shmem_inc(y, 0)
 
Possible outcomes:
 
A = 0, B = 0
A = 0, B = 1 (??)
A = 1, B = 1
A = 1, B = 0
  • Does OpenSHMEM spec force a read-fence between the fence operations on PE 0?
  • Weakly ordered models need fences on both sides.
  • Need wait/test to order on consumer side.
  • Spec seems underspecified - if so, should we specify? Should gets be issuable in any order?
    • We could propose a property on memory fences that are specified on a context.
  • shmem_g and shmem_p conflicting isn't well defined.
  • Atomic operations must ordered in spec.

Not enough time for Naveen's topic, so that is is at the top of the list for next week.

Clone this wiki locally