-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Noncontiguous APIs #365
Comments
From 7/2/2020 meeting, WG prefers the block interleaved API. Would like to see strong drivers for strided APIs. |
Regarding the more general strided APIs... From https://github.com/jeffhammond/oshmpi/blob/master/docs/oug2014_resubmission-acm_4.pdf: It is worth asking whether it is worthwhile to generalize the APUT operation for dimensions higher than two to support tensor operations (for some applications, see [7] and [15]). There are two arguments against this. First, operations on subarrays of dimension greater than two can be expressed in terms of a single APUT operation by combining the strides; for example, a three-dimensional subarray operation can be cast in terms of a two-dimension subarray computation if the stride over x and y are multiplied together (here we assume z is the contiguous dimension that is captured by blockelems). Regardless of the number of dimensions associated with the strides, the key efficiency gain with APUT is accomplished by operating on blocks of contiguous data rather than single elements, as is the case for IPUT. Second, the myriad of applications involving tensor operations include many cases where cartesian subarrays are not useful. For example, in the domain of quantum chemistry, most tensors have permutation (anti-)symmetry and thus cannot make use of operations designed for non-symmetric subarrays. Such is the complexity of tensor data in the NWChem [3] Tensor Contraction Engine [6] that block-sparse and permutation- (anti)symmetric tensors are mapped to one-dimensional global arrays with an application-defined hashing scheme. |
If you are going to add 2D array support, you might want to think about collectives as well. |
Discussion at RMA WG today: Interest in pursuing the datatypes, API. However, we would need a driver. Possible drivers for noncontig APIs:
|
@jeffhammond I don't understand the argument for dimensions higher than two using APUT. Are you calling it in a loop over the outer dimensions? |
I'm saying 2D is sufficient for cartesian arrays. 3D can be collapsed to 2D by multiplying the first two strides. And so forth. Or one can loop over 2D ops if somehow that doesn't work. The loop overhead isn't going to matter because a 2D operation is going to be relatively expensive. |
Closed by #448 |
Issue
The current interleaved communication routines in OpenSHMEM (
shmem_iput/iget
) transfer single element chunks that are a fixed stride apart (source and destination can have different strides). This API does not capture many noncontiguous data transfer patterns. For example, it is inefficient for applications that transfer array sections on two and higher dimensionality arrays.Possible Solutions
Block Interleaved API
Extend the existing SHMEM interleaved APIs (e.g.
shmem_iput
) to include a block size. This will allow them to support 2d array slice transfers.Strided APIs
Something similar to the ARMCI strided APIs could be used. This supports generic matrix slice transfers.
Subarray APIs
Similar to the MPI subarray datatype. This supports generic matrix slice transfers.
The user specifies the full dimensions of the source and destination matrices and indicates the pointer to the zero'th element. The upper left and lower right indices of the source and destination slices are given to indicate the source/dest buffer.
This API has the advantage of being very easy for users to use (versus strided APIs, which require thinking about the linearization of the matrix). However, because data can be reshaped during the transfer, it also requires more work on the part of implementations.
Datatype API
Similar to MPI datatypes API. Introduce API for datatype creation and put/get APIs that take source and destination datatypes. An additional API could be used to inform the target about the datatype ahead of time:
The text was updated successfully, but these errors were encountered: