Skip to content

RMA WG 02 28 2019

James Dinan edited this page Mar 4, 2019 · 2 revisions

Agenda

  1. Discuss global/local visibility semantics of the put-with-signal (Naveen)
  2. Revisit wait semantics and discuss next steps (Jim/Anshuman)

Attendees

  • Not recorded

Open Action Items

  • None

New Action Items

  • None

Notes

Global and local visible requirements for put w/signal

  • In current text, we do not specify if put buffer is visible to remote PE Do we need to add text to specify that global visibility is guarunteed upon completion of put
  • Semantics: Current spec uses "completion", is this fence or quiet? "Completion" does not describe visibility Requiring global visibility is going to be performance hit on "completion"
  • If we can get a clearly defined signal operation, then we can use these semantics and know how to poll for signal completion.

Global visibility example is given: PE A writes with put w/s to PE B, PE C is polling on signal location. What happens?

Question: Can one call shmem_wait or shmem_test on an address from shmem_ptr? It is currently not prevented. Hard to catch this in implementations.

PCIe discussion related to visibility and performance:

  • There are PCIe issues with ordering of atomics and writes under some condition. If there are two writes on two different PCIe endpoints, then there is no ordering guaruntee.
  • For put with signal that is guarunteed globally visible to third party, then you have to do a write from A to B, followed by a read 0 to make sure that the data is pushed all the way through PCIe and so is visible to C.
  • So, global visibility of the signal is expensive because you have to do a read after the write of the signal to push it all the way through PCIe.

Review semantics of quiet and fence in current spec:

  • In quiet semantics, it explicitly says everything is globally visible after quiet returns.
  • In fence semantics, it only requires ordering of arrival of operations before and after fence, so much lighter weight requirement
  • To force ordering of atomics w.r.t. write on PCIe, you don't need a read just to enforce ordering, so fence should be lighter weight than quiet in practice.

General consensus on put with signal semantics:

  • For put w/s, we really only care about local completion order, that the put completes before the signal for the target PE, and the signal completion on target PE does not promise global visibility of the preceeding write or signal.
  • If the user wants global visibility, they have to use quiet explicitly.
  • Put w/s should be semantically equivalent to put to X, fence on only affecting the put to X, signal to Y
  • This is weaker than the requirement to be put, shmem_fence, signal, since put w/s does not affect previous put operations, like a shmem_fence would do.
  • In order to clean up the language, Naveen will send out updates and get feedback to find the right phrasing to capture this.

Do we need progress guaruntees for put w.s?

  • There is some sense that there must be some progress guaruntee for synchronizing operations, but no overall consensus here.

Put with increment (or other atomics)

  • User may want to add argument to specify atomic operation on the signal value, like set, add, etc instead of just put w.s. So, you have, say, put w/ increment.
  • Since put w.s. has been relaxed away from atomic, this put w/ increment functionality can be proposed as a separate operation. Follow up on this for next week.

Which operations are compatible with wait and test APIs?

Conclusions over the past month:

  • AMOS and put w.s. are compatible with wait/test, and no other updates are compatible with wait/test
  • wait cannot observe partial updates, this will be moved from Notes into main text and cleaned up to make sure it is very clear.

Open items:

  • Introduce shmem_signal API? We already have put w/s with zero data
Clone this wiki locally