-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Block-Strided RMA Operations #448
Conversation
Signed-off-by: James Dinan <[email protected]>
Signed-off-by: James Dinan <[email protected]>
The proposed block-strided RMA interfaces ibput/ibget are close to being useful to support movement of arbitrary data structures. They're close enough that I would expect the interfaces would be abused in order to make it work by padding structs and casting pointers to an appropriate data type. So why not support a single interface for arbitrary data structures and block sizes and drop the typed interfaces? The proposal requires dst and sst strides to have a value greater than or equal to 1, not just for the new block-strided RMA operations, but for the existing strided interfaces. It appears this requirement was placed on the previous strided interfaces iput but not iget (the current specification seems to be inconsistent here). Having negative strides enabled applications to perform things like inverting data order. What is the reason for requiring the stride to be >= 1? Do we deprecate the current iget strided interface in this case because this proposal reduces functionality? I believe the >= 1 stride requirement isn't necessary for any of the interfaces. |
@jamesaross Appreciate your feedback on the proposal. Interested to hear your thoughts on the other options under discussion in #365. The working group felt block-strided would be an obvious extension of what we already have, so we decided to bring it forward to start a discussion. It looks like OpenSHMEM added the stride greater or equal to 1 restriction in OpenSHMEM 1.1. |
@jdinan The possible solution of shmem_iputmem/shmem_igetmem in #365 appears to generalize the datatype so that it would support arbitrary data structures and all of the typed routines in this proposal. There doesn't appear to be a requirement within for strides greater or equal to 1 for shmem_iget. This is possibly an oversight, but adding it now would hobble prior capability. |
FWIW, I do like the idea of adding a I'm not so sure about supporting negative strides - In SOS, it looks like we assume |
In the case of sst=-2, you would need to send it an appropriate source address so the operation could go out of bounds (i.e. I'm not strongly attached to the need for negative strides, but it removes error checking from library code, makes it an application/user issue, and potentially provides a capability to minimize extra data manipulation in some cases. So I think we're looking to add |
Name suggestion from @BryantLam |
@swpoole To provide feedback on usage models in some apps. |
I hear dense matrix computations are an application domain of interest in HPC 😄 |
@yfguo Please review. |
@davidozog Pointed out that we don't have nonblocking versions of the proposed API and are also missing nonblocking versions of the existing interleaved APIs. |
This was discussed at the 11/29/21 RMA WG meeting and we decided to move forward with the block-interleaved proposal, while considering a future subarray API if need arises. |
Scheduling this proposal for a re-reading at the December 2, 2022 meeting as it's been two years since the last reading. |
Signed-off-by: James Dinan <[email protected]>
Signed-off-by: James Dinan <[email protected]>
Extend the SHMEM strided APIs (e.g.
shmem_iput
) to include a block size, e.g.:See #365 for details.