Skip to content

RMA WG 03 29 2018

Md edited this page Mar 29, 2018 · 2 revisions

Agenda

  1. Collect opens, assign note taker
  2. Naveen (Cray): Nonblocking AMOs

Attendees

  • Naveen, Bob (Cray), Jim, Dave, Wasi (Intel), Tony (SBU), Nick (DOD), Megan Grodowitz (ARM), Min Si (ANL)

Actions

  • Follow-up from 3/1: Get feedback/input from users on shmem_put_signal w.r.t. the type of the signal word (e.g., size_t, uint64_t, something else?) and the type of the signal operation (e.g., atomic write, atomic add).

Minutes

  • Naveen from Cray presenting Non-blocking AMOs

  • Q: Interest to add this in Spec? Q: Do we need explicit NB? Q: Should we need all of the possible APIs implemented?

  • Discussion

    • (Nick) What is the benefit of NB non-fetch operation? both performance and semantic?
    • (Naveen) Performance wise it can break the chain when a number of non-bloxking operations are used with one blocking.
    • (Jim) Some implementations need allocation for pass by ref types. That might increase the injection bytes.
    • (Bob) Most current implementation is pass by value. Proposed is pass by reference. Following the same semantics as NB Put. Question lies whether it should use pass by value or reference for the APIs.
    • (Nick) Pass by reference is essentially same as it has to copy the pointer.
    • (Jim) For re-transmission of data, the provider needs to store the value in a buffer, handled in lower layer.
    • (Naveen) Is there any upper limit of the number of injections? Running out of the slots requires waiting for getting some of the buffer space getting emptied.
    • (Tony) UCX proposing post-and-fetch interface that would be NB. Would buffering these values be done? Possible issues, need to check. fetch and swap depends on the future variables (value to be obtained later). Checked and UCX supports pass by value.
    • (Bob) More in favor of explicit handles? Handles are not fully implemented yet.
    • (Nick) Would it be ok to call shmem_wait to wait for an object to receive the fetched value in later time? ordering in completion might not have been guaranteed.
    • (Jim) OPA 100 NIC RDMA stores the received data in the destination before the CRC. So, re-transmission might occur in this case. Value for explicit handles in other RMA operations apart from AMO. NBI proposal can go with the current state. NBE proposal, we can keep it around for upcoming spec.
    • (Naveen) Only basic AMOs or all of them?
    • (Jim) FI_INJECT can be handled more efficiently than FI_WRITE. However, that is not globally true. For sockets, it can be reverse. Potential for NBI version of non-fetch with pass by ref. For fetching, the benefit is obvious, for non-fetch its unclear.
  • Concluding Remarks

    • Develop this into a proposal. Whats a good starting point? Everything or narrowed down? Value for NB fetch atomic. Should start with this only. Latency of a fetching operation should be high which is beneficial to hide.
    • Is it confusing for the users to not provie the full API? No perhaps.
    • Creating a set of micro-benchmarks that can showcase the benefits. Explore apps directory in the SOS tests. Possible code exists that can be used for this. Contact Jim if not found in the SOS repo.
Clone this wiki locally