Skip to content

RMA WG 04 12 2018

James Dinan edited this page Apr 16, 2018 · 4 revisions

Agenda

  1. Collect opens, assign note taker
  2. Dave O.: Test/Wait Some Proposal Review

Attendees

  • No roll this meeting

Actions

  • Follow-up from 3/1: Get feedback/input from users on shmem_put_signal w.r.t. the type of the signal word (e.g., size_t, uint64_t, something else?) and the type of the signal operation (e.g., atomic write, atomic add).
    • [Nick] Unfortunately, I'm unable to join the WG this week, but I wanted to provide feedback for the WG's consideration:

      For the signal word, I think uint64_t is the preferred type. For the signal operation, I think an atomic write is the primary form of interest, but there is also interest in atomic add. If the WG is open to considering addition of both operations but wants to minimize API expansion, we could consider using a signal_op argument to specify the operation; e.g.,

      void shmem_put_signal(TYPE *dest, const TYPE *source, size_t nelems,
                            int signal_op, uint64_t *signal_word, int pe);

      where signal_op may be SHMEM_SIGNAL_WRITE or SHMEM_SIGNAL_ADD.

Minutes

Regarding put w/ signal

The operation of greatest interest is put with signal -> put followed by atomic write Keeping the same API, but adding an argument for the operation would allow for put with atomic increment, which would also be very useful.

Adding the operation argument would add a branching instruction in the put call Is this acceptable overhead?

  • Since put is slow and unlikely to be called in a tight loop, the branch overhead should be acceptable
  • The total number of supportable operations will be limited based on network support

Open Question: Since OpenSHMEM defines atomic operations to be atomic only in relation to other atomic operations, do I need to use atomic get on the recv side to get the signal word?

  • On one hand, we may just get garbage if we try a regular read on the signal word if it is using atomic operation for put but not for get.
  • On the other hand, put with signal itself is not defined as an atomic operation, so a regular read should potentially be fine.

Discussion did not reach a satisfactory answer, so this will be taken up again at the next meeting

Regarding supported types for wait_until

Table 7 in section 9.9 indicates both size_t and ptrdiff_t to be supported types for wait_until* Does anyone implement this? Should we drop support?

Cons: size_t and ptrdiff_t are more loosely defined types and might be high overhead to implement Pros: In reality though, these are probably limited to 64bit or unsigned 64bit, and people do tend to use these (?)

Cray does not currently implement these as supported types in wait_until, but willing to add if others also add

Regarding wait_until_some

Do we want an array of values to replace the single cmp_value so that each value in ivars can be tested against a separate condition?

Cons: memory consumption, more complicated user code to setup array Pros: API may not be useful without separate values if users seldom need to test many values with a single comparison

How can we consider the tradeoff between looping over many compare values vs. fencing between multiple calls to wait_until?

  • In-cache vs. out-of-cache will matter, and this depends on actual use cases
  • Need feedback about the use cases for wait_until to determine if the API needs multiple compare values and if so, what kind of performance hit will it take to add them

What should be done about the odd initialization and output rules for indices?

  • Currently, indices must be initialized to anything > nelems
  • Out value is not strictly defined, is it left as the in value? Set to nelems exactly?

General consensus seems to be change this to a mask array with values of 0/1. The original rationale for the in/out values is no longer the case, so this can be changed. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Clone this wiki locally