-
Notifications
You must be signed in to change notification settings - Fork 41
RMA WG 05 10 2018
James Dinan edited this page May 10, 2018
·
1 revision
- Collect opens, assign note taker
- Updates from proposal leads
- Did not take roll
* Naveen: Still have no performance numbers - maybe could get some by next meeting
Do we need performance numbers? We've removed non-fetching already, but would users benefit from having non-blocking fetching AMOs?
Jim: The argument is pretty good without supporting data.
* Manju questions:
Do you have a handle with each NB AMO? No
How does fetching work? Separate API, context argument, needs a quiet
Where is value stored? Similar to NB get with implicit handle, destination argument has been added
Hardware support? Aries has some support, 8 bytes only.
* Advantages:
Naveen: Overlap is acheivable with NB AMOs, API is good for chaining multiple AMOs together
Jim: higher issue rate, better pipelining
* Manju: Might not need NB AMOs, don't need to change current API much if there's no real hardware support
Jim: All libfabric is implicitly non-blocking, when blocking we wait right after calling, returns to location in stack.
Manju: Could move written value to arguments instead of return value
Jim: User does supply arg value, need to complete with quiet before user can read
Manju: In NB case, can't reuse buffer, need to wait for quiet
Jim: NBE stuff could work too, but not with 1.4 spec.
* Manju: Not convinced there will be a performance difference with non-blocking AMO's... internally doing the same thing.
Jim: Definitely expect it in our implementation
Manju: Where is the performance benefit vs current API?
Jim: Issuing all NB AMOs in a loop then waiting would be far better than waiting on each round-trip
Manju: Doing a copy, in NB, cannot reuse the buffer until quiet, has to wait anyway
Jim: Blocking fetch AMO depends on round-trip latency, NB doesn't need it, blocking ops are locally completed upon return (may not be visible at target). A simple experiment could show the benefit.
* Plan for this PR:
Naveen: DMAPP has a few difficulties - for instance, passing by reference / by value
will look at libfabric as well
could do informal reading next RMA WG (May 24th)
* Need to make updates to the semantic of the return value in current draft
* Have new example which does all-to-all task processing
* completion "status" argument array will be 0/1 of type _Bool
* Will email Manju about informal reading for the 21st meeting
* Still in the concept stage, need Nick to know more
* Naveen: Bob is working on it, planning to have a reading soon in WG
Put with signal as well as put with increment requested?
Jim: Put with the signal as an atomic write
Put with inc could be added to proposal in the future?
* Pass signal operator as an input argument?
* Signal is an atomic op, and wait should block on that. Does spec allow that? Yes, in Jim's proposal.
* Manju: Does wait need flush every read?
Jim: No, do need a read fence before returning from wait
Manju: Potential problem operating in 2 different atomic domains
Jim: One place is consumer, one place is processor (cpu/nic)
Manju: consistency between different domains could be an issue
Jim: not coherence problem, a consistency issue (write needs to be visible to read). If atomics cache not write-through - shmem_wait needs to do something to update NIC
Manju: Specification's memory model may not specify this. There's also an ordering assumption - put *then* signal.
Jim: Worst case; put/fence/blocking AMO. Better case: register both operations simultaneously, makes put_with_signal non-blocking
* Naveen: do we need the new shmem_wait semantic on atomics?
Jim: Hopefully what we need is posted in proposal #204
Naveen: What about totally removing Fortran? Should discuss at 21st meeting.
-
Working Groups
-
Errata