Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Block-Strided RMA Operations #448

Merged
merged 8 commits into from
Oct 4, 2023
Merged

Conversation

jdinan
Copy link
Collaborator

@jdinan jdinan commented Jul 23, 2020

Extend the SHMEM strided APIs (e.g. shmem_iput) to include a block size, e.g.:

void shmem_ibput(TYPE *dest, const TYPE *source, ptrdiff_t dst, ptrdiff_t sst,
                 size_t bsize, size_t nblocks, int pe);

See #365 for details.

@jamesaross
Copy link
Contributor

The proposed block-strided RMA interfaces ibput/ibget are close to being useful to support movement of arbitrary data structures. They're close enough that I would expect the interfaces would be abused in order to make it work by padding structs and casting pointers to an appropriate data type. So why not support a single interface for arbitrary data structures and block sizes and drop the typed interfaces?

The proposal requires dst and sst strides to have a value greater than or equal to 1, not just for the new block-strided RMA operations, but for the existing strided interfaces. It appears this requirement was placed on the previous strided interfaces iput but not iget (the current specification seems to be inconsistent here). Having negative strides enabled applications to perform things like inverting data order. What is the reason for requiring the stride to be >= 1? Do we deprecate the current iget strided interface in this case because this proposal reduces functionality? I believe the >= 1 stride requirement isn't necessary for any of the interfaces.

@jdinan
Copy link
Collaborator Author

jdinan commented Jul 24, 2020

@jamesaross Appreciate your feedback on the proposal. Interested to hear your thoughts on the other options under discussion in #365. The working group felt block-strided would be an obvious extension of what we already have, so we decided to bring it forward to start a discussion.

It looks like OpenSHMEM added the stride greater or equal to 1 restriction in OpenSHMEM 1.1.

@jamesaross
Copy link
Contributor

@jdinan The possible solution of shmem_iputmem/shmem_igetmem in #365 appears to generalize the datatype so that it would support arbitrary data structures and all of the typed routines in this proposal.

There doesn't appear to be a requirement within for strides greater or equal to 1 for shmem_iget. This is possibly an oversight, but adding it now would hobble prior capability.

@davidozog
Copy link
Collaborator

FWIW, I do like the idea of adding a shmem_ibputmem/ibgetmem... it only adds 2 more symbols to the spec and quite a bit of flexibility.

I'm not so sure about supporting negative strides - iget looks like an oversight to me. Is it useful? I'm not sure I understand how it would work. If the source array is {1, 2, 3, 4, ..., 10}, and you ask for sst=-2, would it it send {1, 9, 7, 5, 3} or something else like {10, 8, 6, 4, 2}? Seems a little strange, but maybe there's a use case I'm missing.

In SOS, it looks like we assume dst and sst are positive and return an error if not (if the user does "error checking", but the operation goes out of bounds otherwise...).

@jamesaross
Copy link
Contributor

@davidozog

In the case of sst=-2, you would need to send it an appropriate source address so the operation could go out of bounds (i.e. &source[9]). I don't think wrapping around the data is correct since the function doesn't know the length of the array. Going out of bounds is a potential issue even without negative strides.

I'm not strongly attached to the need for negative strides, but it removes error checking from library code, makes it an application/user issue, and potentially provides a capability to minimize extra data manipulation in some cases.

So I think we're looking to add ibputmem/ibgetmem.

content/shmem_iget.tex Outdated Show resolved Hide resolved
@jdinan
Copy link
Collaborator Author

jdinan commented Aug 7, 2020

Name suggestion from @BryantLam shmem_iblock_put/get.

@jdinan
Copy link
Collaborator Author

jdinan commented Aug 7, 2020

@swpoole To provide feedback on usage models in some apps.

@jeffhammond
Copy link

I hear dense matrix computations are an application domain of interest in HPC 😄
oug2014slides.pdf

@jdinan
Copy link
Collaborator Author

jdinan commented Sep 20, 2021

@yfguo Please review.

@jdinan
Copy link
Collaborator Author

jdinan commented Nov 29, 2021

@davidozog Pointed out that we don't have nonblocking versions of the proposed API and are also missing nonblocking versions of the existing interleaved APIs.

@jdinan
Copy link
Collaborator Author

jdinan commented Nov 29, 2021

This was discussed at the 11/29/21 RMA WG meeting and we decided to move forward with the block-interleaved proposal, while considering a future subarray API if need arises.

@jdinan
Copy link
Collaborator Author

jdinan commented Nov 14, 2022

Scheduling this proposal for a re-reading at the December 2, 2022 meeting as it's been two years since the last reading.

content/shmem_ibget.tex Outdated Show resolved Hide resolved
content/shmem_ibget.tex Outdated Show resolved Hide resolved
content/shmem_ibput.tex Outdated Show resolved Hide resolved
content/shmem_ibput.tex Outdated Show resolved Hide resolved
content/shmem_ibput.tex Outdated Show resolved Hide resolved
content/shmem_ibput.tex Outdated Show resolved Hide resolved
content/shmem_ibget.tex Outdated Show resolved Hide resolved
@jdinan
Copy link
Collaborator Author

jdinan commented Sep 8, 2023

content/backmatter.tex Outdated Show resolved Hide resolved
@jdinan jdinan merged commit a0b22f7 into openshmem-org:master Oct 4, 2023
1 check passed
@jdinan jdinan mentioned this pull request Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants