RMA WG 06 20 2019

Agenda:

Anshuman's get ordering benchmark
Naveen's recent changes to shmem_fetch_signal

Anshuman wrote a 'get ordering' benchmark:

Single element get from every PE (in rounds).
No ordering between rounds, fences between rounds in NVSHMEM.
Unclear (according to v1.4 spec) if gets can be issued in any order.
In NVSHMEM single element get is a weak load (can be issued in any order) - probably not spec conformant.
Performance experiment possibly motivates relaxing the ordering requirement.
Benchmark has 2 versions: - 1 fence between rounds (~6 us w 4 PEs) - 1 fence per get (~13 us w/ 4 PEs)
Increasing PEs exacerbates this affect.
Anshuman will try to share the benchmark code, but must check with legal compliance.
Q: could a read force ordering within shmem_g?

A: Valid point.
Q: So what's difference between blocking and non-blocking get.

A: Nonblocking get has no guarantee of completion after return. With blocking, value can be read from dest buffer after returning.
Q: Isnt' this unique to NVSHMEM?

A: No - expect similar behavior on Power9. This affects architectures with load/store based gets.
Users kinda expect program order to be maintained.
If not using dest buffer, can you reorder? Should be allowable, but compilers can't really do shmem-aware function reordering.
Fortran can reorder functions, for example.
Does Blocking say anything about preserving program order?
Discussed 3 relevant examples:

————————————————————————————

Process 0:
 
A = shmem_fetch(x, 0)
B = shmem_fetch(x, 0)
 
C = B
 
Process 1:
 
shmem_inc(x, 0)
 
Possible outcomes:
 
A = 0, B = 0
A = 0, B = 1
A = 1, B = 1
 
A = 1, B = 0 (??)
 
————————————————————————————
 
Process 0:
 
Initially: x = 0, y = 0
 
A = shmem_fetch(x, 0)
B = shmem_fetch(y, 0)
 
Process 1:
 
shmem_inc(x, 0)
shmem_fence()
shmem_inc(y, 0)
 
Possible outcomes:
 
A = 0, B = 0
A = 0, B = 1 (??)
A = 1, B = 1
A = 1, B = 0
 
————————————————————————————
 
Process 0:
 
Initially: x = 0, y = 0
 
while (! shmem_fetch(y, 0) ) ;
B = shmem_g(x, 0)
 
Process 1:
 
shmem_p(x, 1, 0)
shmem_fence()
shmem_inc(y, 0)
 
Possible outcomes:
 
A = 0, B = 0
A = 0, B = 1 (??)
A = 1, B = 1
A = 1, B = 0

Does OpenSHMEM spec force a read-fence between the fence operations on PE 0?
Weakly ordered models need fences on both sides.
Need wait/test to order on consumer side.
Spec seems underspecified - if so, should we specify? Should gets be issuable in any order?
- We could propose a property on memory fences that are specified on a context.
shmem_g and shmem_p conflicting isn't well defined.
Atomic operations must ordered in spec.

Not enough time for Naveen's topic, so that is is at the top of the list for next week.

Home
Working Groups
Errata
- OpenSHMEM 1.5 Errata
2024 Committee Meetings
2023 Committee Meetings
- March 2023
- May 2023
- July 2023
- Sep 2023
- Dec 2023
2022 Committee Meetings
- Jan 2022
- March 2022
- May 2022
- July 2022
- Sep 2022
- Dec 2022
2021 Committee Meetings
2020 Committee Meetings
2019 Committee Meetings
2018 Committee Meetings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMA WG 06 20 2019

Anshuman wrote a 'get ordering' benchmark:

Not enough time for Naveen's topic, so that is is at the top of the list for next week.

Clone this wiki locally