Skip to content

Commit

Permalink
Merge remote-tracking branch 'jdinan/collectives_section-v1.6-RC1' in…
Browse files Browse the repository at this point in the history
…to tmp
  • Loading branch information
jdinan committed Sep 6, 2024
2 parents db71f8e + 00bcc40 commit 34d394c
Show file tree
Hide file tree
Showing 10 changed files with 242 additions and 99 deletions.
25 changes: 14 additions & 11 deletions content/collective_intro.tex
Original file line number Diff line number Diff line change
@@ -1,21 +1,24 @@
\emph{Collective routines} are defined as coordinated communication or synchronization
operations performed by a group of \acp{PE}.

\openshmem provides three types of collective routines:
\openshmem provides four types of collective routines:

\begin{enumerate}
\item Collective routines that operate on teams use a team handle parameter to determine
which \acp{PE} will participate in the routine, and use resources encapsulated by the team object
to perform operations. See Section~\ref{subsec:team} for details on team management.
\item Collective routines that operate on teams use a team handle parameter to determine
which \acp{PE} will participate in the routine, and use resources encapsulated by the team object
to perform operations. See Section~\ref{subsec:team} for details on team management.

\begin{DeprecateBlock}
\item Collective routines that operate on active sets use a set of parameters to determine
which \acp{PE} will participate and what resources are used to perform operations.
\end{DeprecateBlock}
\begin{DeprecateBlock}
\item Collective routines that operate on active sets use a set of parameters to determine
which \acp{PE} will participate and what resources are used to perform operations.

\item Collective routines that do not accept active set
parameters and, as required, the default context.
\end{DeprecateBlock}

\item Collective routines that accept neither team nor active set
parameters, which implicitly operate on the world team and, as
required, the default context.
\item Collective routines that do not accept team
parameters, which implicitly operate on the world team and, as
required, the default context.
\end{enumerate}

Concurrent accesses to symmetric memory by an \openshmem collective
Expand Down
56 changes: 38 additions & 18 deletions content/shmem_alltoall.tex
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,17 @@

\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive
the combined total of \VAR{nelems} elements from each \ac{PE} in the
active set.
participating \acp{PE}.
The type of \dest{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems}
elements of data for each \ac{PE} in the active set, ordered according to
elements of data for each \ac{PE} in the participating \acp{PE}, ordered according to
destination \ac{PE}.
The type of \source{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{nelems}{
The number of elements to exchange for each \ac{PE}.
For \FUNC{shmem\_alltoallmem}, elements are bytes;
for \FUNC{shmem\_alltoall\{32,64\}}, elements are 4 or 8 bytes,
respectively.
The number of elements to exchange for each \ac{PE}.
For \FUNC{shmem\_alltoallmem}, elements are bytes;
for \FUNC{shmem\_alltoall\{32,64\}}, elements are 4 or 8 bytes,
respectively.
}

\begin{DeprecateBlock}
Expand Down Expand Up @@ -89,9 +89,7 @@
Given a \ac{PE} \VAR{i} that is the \kth \ac{PE}
participating in the operation and a \ac{PE}
\VAR{j} that is the \lth \ac{PE}
participating in the operation,

\ac{PE} \VAR{i} sends the \lth block of its \VAR{source} object to
participating in the operation, \ac{PE} \VAR{i} sends the \lth block of its \VAR{source} object to
the \kth block of
the \VAR{dest} object of \ac{PE} \VAR{j}.

Expand All @@ -100,6 +98,25 @@
If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is
otherwise invalid, the behavior is undefined.

Before any \ac{PE} calls a \FUNC{shmem\_alltoall} routine, the following
conditions must be ensured, otherwise the behavior is undefined:
\begin{itemize}
\item The \dest{} array on all \acp{PE} in the team is ready to
accept the result of the operation.
\item The \source{} array at the local \ac{PE} is ready to be
read by any \ac{PE} in the team.
\end{itemize}
The application does not need to synchronize to ensure that the \source{}
array is ready across all \acp{PE} prior to calling this routine.

Upon return from a \FUNC{shmem\_alltoall} routine, the following is true for
the local PE:
\begin{itemize}
\item Its \VAR{dest} symmetric data object is completely updated and the
data has been copied out of the source data object.
\end{itemize}

\begin{DeprecateBlock}
Active-set-based collective routines operate over all \acp{PE} in the active set
defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet.

Expand All @@ -116,23 +133,26 @@

Before any \ac{PE} calls a \FUNC{shmem\_alltoall} routine,
the following conditions must be ensured:

\begin{itemize}
\item The \VAR{dest} data object on all \acp{PE} in the active set is
ready to accept the \FUNC{shmem\_alltoall} data.
\item For active-set-based routines, the \VAR{pSync} array
on all \acp{PE} in the active set is not still in use from a prior call
to a \FUNC{shmem\_alltoall} routine.
\item The \VAR{dest} data object on all \acp{PE} in the active set is
ready to accept the \FUNC{shmem\_alltoall} data.
\item For active-set-based routines, the \VAR{pSync} array
on all \acp{PE} in the active set is not still in use from a prior call
to a \FUNC{shmem\_alltoall} routine.
\end{itemize}

Otherwise, the behavior is undefined.

Upon return from a \FUNC{shmem\_alltoall} routine, the following is true for
the local PE:
\begin{itemize}
\item Its \VAR{dest} symmetric data object is completely updated and
the data has been copied out of the \VAR{source} data object.
\item For active-set-based routines,
the values in the \VAR{pSync} array are restored to the original values.
\item Its \VAR{dest} symmetric data object is completely updated and the
data has been copied out of the source data object.
\item For active-set-based routines,
the values in the \VAR{pSync} array are restored to the original values.
\end{itemize}
\end{DeprecateBlock}
}

\apireturnvalues{
Expand Down
4 changes: 2 additions & 2 deletions content/shmem_alltoalls.tex
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@

\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive
the combined total of \VAR{nelems} elements from each \ac{PE} in the
active set.
participating \acp{PE}.
The type of \dest{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems}
elements of data for each \ac{PE} in the active set, ordered according to
elements of data for each \ac{PE} in the participating \acp{PE}, ordered according to
destination \ac{PE}.
The type of \source{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{dst}{The stride between consecutive elements of the \dest{}
Expand Down
83 changes: 54 additions & 29 deletions content/shmem_broadcast.tex
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
respectively.
}
\apiargument{IN}{PE\_root}{Zero-based ordinal of the \ac{PE}, with respect to
the team or active set, from which the data is copied.}
the calling PEs, from which the data is copied.}

\begin{DeprecateBlock}

Expand All @@ -61,8 +61,7 @@
\end{apiarguments}

\apidescription{
\openshmem broadcast routines are collective routines over an active set or
valid \openshmem team.
\openshmem team-based broadcast routines are collective routines over a valid \openshmem team.
They copy the \source{} data object on the \ac{PE} specified by
\VAR{PE\_root} to the \dest{} data object on the \acp{PE}
participating in the collective operation.
Expand All @@ -75,6 +74,9 @@
\item The \dest{} object is updated on all \acp{PE}.
\item All \acp{PE} in the \VAR{team} argument must participate in
the operation.
\item Only \acp{PE} in the team may call the routine. If a
\ac{PE} not in the team calls a team-based
collective routine, the behavior is undefined.
\item If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is
otherwise invalid, the behavior is undefined.
\item \ac{PE} numbering is relative to the team. The specified
Expand All @@ -83,58 +85,81 @@
the team.
\end{itemize}

Before any \ac{PE} calls a broadcast routine, the following conditions
must be ensured, otherwise the behavior is undefined:
\begin{itemize}
\item The \dest{} array on all \acp{PE} in the team is ready to
accept the result of the operation.
\item The \source{} array at the local root \ac{PE} is ready to be
read by any \ac{PE} in the team.
\end{itemize}
The application does not need to synchronize to ensure that the \source{}
array is ready across all \acp{PE} prior to calling this routine.

Upon return from a team-based broadcast routine, the following are true for the local
\ac{PE}:
\begin{itemize}
\item The \dest{} data object is updated.
\item The \source{} data object may be safely reused.
\end{itemize}

\begin{DeprecateBlock}
\openshmem active-set broadcast routines are collective routines over an active set.
They copy the \source{} data object on the \ac{PE} specified by
\VAR{PE\_root} to the \dest{} data object on the \acp{PE}
participating in the collective operation.
The same \dest{} and \source{} data objects and the same value of
\VAR{PE\_root} must be passed by all \acp{PE} participating in the
collective operation.

For active-set-based broadcasts:
\begin{itemize}
\item The \dest{} object is updated on all \acp{PE} other than the
root \ac{PE}.
\item All \acp{PE} in the active set defined by the
\VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet
must participate in the operation.
\item Only \acp{PE} in the active set may call the routine. If a
\ac{PE} not in the active set calls an active-set-based
\item The \VAR{dest} object is updated on all PEs other than the root PE.
\item All \acp{PE} in the active set defined by the
\VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet
must participate in the operation.
\item Only \acp{PE} in the active set may call the routine. If a
\ac{PE} not in the active set calls an active-set-based
collective routine, the behavior is undefined.
\item The values of arguments \VAR{PE\_root}, \VAR{PE\_start},
\item The values of arguments \VAR{PE\_root}, \VAR{PE\_start},
\VAR{logPE\_stride}, and \VAR{PE\_size} must be the same value
on all \acp{PE} in the active set.
\item The value of \VAR{PE\_root} must be between \CONST{0} and
\item The value of \VAR{PE\_root} must be between \CONST{0} and
\VAR{PE\_size $-$ 1}.
\item The same \VAR{pSync} work array must be passed by all \acp{PE}
\item The same \VAR{pSync} work array must be passed by all \acp{PE}
in the active set.
\end{itemize}

Before any \ac{PE} calls a broadcast routine, the following
Before any \ac{PE} calls a active-set-based broadcast routine, the following
conditions must be ensured:
\begin{itemize}
\item The \dest{} array on all \acp{PE} participating in the broadcast
is ready to accept the broadcast data.
\item For active-set-based broadcasts, the
\VAR{pSync} array on all \acp{PE} in the
active set is not still in use from a prior call to an \openshmem
collective routine.
\item The \dest{} array on all \acp{PE} participating in the broadcast
is ready to accept the broadcast data.
\item The \VAR{pSync} array on all \acp{PE} in the
active set is not still in use from a prior call to an \openshmem
collective routine.
\end{itemize}
Otherwise, the behavior is undefined.

Upon return from a broadcast routine, the following are true for the local
Upon return from an active-based broadcast routine, the following are true for the local
\ac{PE}:
\begin{itemize}
\item For team-based broadcasts, the \dest{} data object is
updated.
\item For active-set-based broadcasts:
\begin{itemize}
\item If the current \ac{PE} is not the root \ac{PE}, the
\dest{} data object is updated.
\item If the current PE is not the root PE, the \dest{} data object is updated.
\item The \source{} data object may be safely reused.
\item The values in the \VAR{pSync} array are restored to the
original values.
\end{itemize}
\item The \source{} data object may be safely reused.
\end{itemize}
\end{DeprecateBlock}
}


\apireturnvalues{
For team-based broadcasts, zero on successful local completion; otherwise, nonzero.

\begin{DeprecateBlock}
For active-set-based broadcasts, none.
\end{DeprecateBlock}

}

\apinotes{
Expand Down
48 changes: 42 additions & 6 deletions content/shmem_collect.tex
Original file line number Diff line number Diff line change
Expand Up @@ -66,15 +66,13 @@
\openshmem \FUNC{collect} and \FUNC{fcollect} routines perform a collective
operation to concatenate \VAR{nelems}
data items from the \source{} array into the
\dest{} array, over an \openshmem team or active set
in processor number order. The resultant \dest{} array contains the contribution from
\dest{} array, over an \openshmem team in processor number order.
The resultant \dest{} array contains the contribution from
\acp{PE} as follows:

\begin{itemize}
\item For an active set, the data from \ac{PE} \VAR{PE\_start} is first, then the
contribution from \ac{PE} \VAR{PE\_start} + \VAR{PE\_stride} second, and so on.
\item For a team, the data from \ac{PE} number \CONST{0} in the team is first, then the
contribution from \ac{PE} \CONST{1} in the team, and so on.
\item For a team, the data from \ac{PE} number \CONST{0} in the team is first, then the
contribution from \ac{PE} \CONST{1} in the team, and so on.
\end{itemize}

The collected result is written to the \dest{} array for all \acp{PE}
Expand All @@ -90,6 +88,37 @@
If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is
otherwise invalid, the behavior is undefined.

Before any \ac{PE} calls a collect routine, the following conditions must
be ensured, otherwise the behavior is undefined:
\begin{itemize}
\item The \dest{} array on all \acp{PE} in the team is ready to
accept the result of the operation.
\item The \source{} array at the local \ac{PE} is ready to be read
by any \ac{PE} in the team.
\end{itemize}
The application does not need to synchronize to ensure that the \source{}
array is ready across all \acp{PE} prior to calling this routine.

\begin{DeprecateBlock}
\openshmem \FUNC{collect} and \FUNC{fcollect} routines perform a collective
operation to concatenate \VAR{nelems}
data items from the \source{} array into the
\dest{} array, over an \openshmem active set
in processor number order. The resultant \dest{} array contains the contribution from
\acp{PE} as follows:
\begin{itemize}
\item For an active set, the data from \ac{PE} \VAR{PE\_start} is first, then the
contribution from \ac{PE} \VAR{PE\_start} + \VAR{PE\_stride} second, and so on.
\end{itemize}

The collected result is written to the \dest{} array for all \acp{PE}
that participate in the operation. The same \dest{} and \source{}
arrays must be passed by all \acp{PE} that participate in the operation.

The \FUNC{fcollect} routines require that \VAR{nelems} be the same value in all
participating \acp{PE}, while the \FUNC{collect} routines allow \VAR{nelems} to
vary from \ac{PE} to \ac{PE}.

Active-set-based collective routines operate over all \acp{PE} in the active set
defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet.
As with all active-set-based collective routines,
Expand All @@ -108,16 +137,23 @@
\item For active-set-based collective routines, the values in the \VAR{pSync} array are
restored to the original values.
\end{itemize}
\end{DeprecateBlock}
}

\apireturnvalues{
Zero on successful local completion. Nonzero otherwise.
}

\apinotes{
\begin{DeprecateBlock}
The collective routines operate on active \ac{PE} sets that have a
non-power-of-two \VAR{PE\_size} with some performance degradation. They operate
with no performance degradation when \VAR{nelems} is a non-power-of-two value.
\end{DeprecateBlock}
The collective routines that operate on teams containing a
non-power-of-two of PEs do so with some performance degradation. They operate
with no performance degradation when \VAR{nelems} is a non-power-of-two value.

}

\begin{apiexamples}
Expand Down
Loading

0 comments on commit 34d394c

Please sign in to comment.