Skip to content

Commit

Permalink
Merge pull request #443 from dsolt/issue_175_event_notification
Browse files Browse the repository at this point in the history
Minor edits to Events chapter
  • Loading branch information
abouteiller authored Feb 28, 2024
2 parents 0a7d0f8 + 52febfb commit 31ac6bf
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 15 deletions.
41 changes: 26 additions & 15 deletions Chap_API_Event.tex
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ \section{Notification and Management}
Both \ac{SMS} elements and applications can register for events of either type.

\adviceimplstart
Race conditions can cause the registration to come after events of possible interest (e.g., a memory \ac{ECC} event that occurs after start of execution but prior to registration, or an application process generating an event prior to another process registering to receive it). \ac{SMS} vendors are \textit{requested} to cache environment events for some time to mitigate this situation, but are not \textit{required} to do so. However, \ac{PMIx} implementers are \textit{required} to cache all events received by the \ac{PMIx} server library and to deliver them to registering clients in the same order in which they were received
Race conditions can cause the registration to come after events of possible interest (e.g., a memory \ac{ECC} event that occurs after start of execution but prior to registration, or an application process generating an event prior to another process registering to receive it). \ac{SMS} vendors are \textit{requested} to cache environment events for some time to mitigate this situation, but are not \textit{required} to do so. However, \ac{PMIx} implementers are \textit{required} to cache all events received by the \ac{PMIx} server library and to deliver them to registering clients in the same order in which they were received.
\adviceimplend

\adviceuserstart
Expand All @@ -32,7 +32,7 @@ \section{Notification and Management}

The generator of an event can specify the \textit{target range} for delivery of that event. Thus, the generator can choose to limit notification to processes on the local node, processes within the same job as the generator, processes within the same allocation, other threads within the same process, only the \ac{SMS} (i.e., not to any application processes), all application processes, or to a custom range based on specific process identifiers. Only processes within the given range that register for the provided event code will be notified. In addition, the generator can use attributes to direct that the event not be delivered to any default event handlers, or to any multi-code handler (as defined below).

Event notifications provide the process identifier of the source of the event plus the event code and any additional information provided by the generator. When an event notification is received by a process, the registered handlers are scanned for their event code(s), with matching handlers assembled into an \textit{event chain} for servicing. Note that users can also specify a \textit{source range} when registering an event (using the same range designators described above) to further limit when they are to be invoked. When assembled, PMIx event chains are ordered based on both the specificity of the event handler and user directives at time of handler registration. By default, handlers are grouped into three categories based on the number of event codes that can trigger the callback:
Event notifications provide the process identifier of the source of the event plus the event code and any additional information provided by the generator. When an event notification is received by a process, the registered handlers are scanned for their event code(s), with matching handlers assembled into an \textit{event chain} for servicing. Note that users can also specify a \textit{source range} when registering an event (using the same range designators described above) to further limit when they are to be invoked. When assembled, the PMIx event chain is ordered based on both the specificity of the event handler and user directives at time of handler registration. By default, handlers are grouped into three categories based on the number of event codes that can trigger the callback:
\begin{itemize}
%
\item \textit{single-code} handlers are serviced first as they are the most specific. These are handlers that are registered against one specific event code.
Expand All @@ -43,7 +43,10 @@ \section{Notification and Management}
%
\end{itemize}

Users can specify the callback order of a handler within its category at the time of registration. Ordering can be specified by providing the relevant event handler names, if the user specified an event handler name when registering the corresponding event. Thus, users can specify that a given handler be executed before or after another handler should both handlers appear in an event chain (the ordering is ignored if the other handler isn't included). Note that ordering does not imply immediate relationships. For example, multiple handlers registered to be serviced after event handler \textit{A} will all be executed after \textit{A}, but are not guaranteed to be executed in any particular order amongst themselves.
Users can specify the callback order of a handler within its category at the time of registration.
Users can specify that a given handler be executed before or after another target handler should both handlers appear in the event chain (the ordering is ignored if the other handler isn't included).
The ordering is dictated by providing the event handler name of the target. The name must have been assigned when the target handler was registered.
Note that ordering does not imply immediate relationships. For example, multiple handlers registered to be serviced after event handler \textit{A} will all be executed after \textit{A}, but are not guaranteed to be executed in any particular order amongst themselves.

In addition, one event handler can be declared as the \textit{first} handler to be executed in the chain. This handler will \textit{always} be called prior to any other handler, regardless of category, provided the incoming event matches both the specified range and event code. Only one handler can be so designated --- attempts to designate additional handlers as \textit{first} will return an error. Deregistration of the declared \textit{first} handler will re-open the position for subsequent assignment.

Expand Down Expand Up @@ -143,10 +146,10 @@ \subsection{\code{PMIx_Register_event_handler}}
%%%%
\descr

Register an event handler to report events. Note that the codes being registered do \textit{not} need to be \ac{PMIx} error constants --- any integer value can be registered. This allows for registration of non-PMIx events such as those defined by a particular \ac{SMS} vendor or by an application itself.
Register an event handler to report events. Note that the codes being registered do \textit{not} need to be \ac{PMIx} event constants --- any integer value can be registered. This allows for registration of non-PMIx events such as those defined by a particular \ac{SMS} vendor or by an application itself.

\adviceuserstart
In order to avoid potential conflicts, users are advised to only define codes that lie outside the range of the \ac{PMIx} standard's error codes. Thus, \ac{SMS} vendors and application developers should constrain their definitions to positive values or negative values beyond the \refconst{PMIX_EXTERNAL_ERR_BASE} boundary.
In order to avoid potential conflicts, users are advised to only use event code values that lie outside the range of the \ac{PMIx} standard's error and event codes (See \ref{api:struct:usererrors}). Thus, \ac{SMS} vendors and application developers should constrain their definitions to positive values or negative values beyond the \refconst{PMIX_EXTERNAL_ERR_BASE} boundary.
\adviceuserend


Expand Down Expand Up @@ -288,19 +291,19 @@ \subsubsection{Fault tolerance event attributes}

%
\declareAttribute{PMIX_EVENT_TERMINATE_SESSION}{"pmix.evterm.sess"}{bool}{
The \ac{RM} intends to terminate this session.
The \ac{RM} intends to terminate the affected session.
}
%
\declareAttribute{PMIX_EVENT_TERMINATE_JOB}{"pmix.evterm.job"}{bool}{
The \ac{RM} intends to terminate this job.
The \ac{RM} intends to terminate the affected job.
}
%
\declareAttribute{PMIX_EVENT_TERMINATE_NODE}{"pmix.evterm.node"}{bool}{
The \ac{RM} intends to terminate all processes on this node.
The \ac{RM} intends to terminate all processes on the affected node.
}
%
\declareAttribute{PMIX_EVENT_TERMINATE_PROC}{"pmix.evterm.proc"}{bool}{
The \ac{RM} intends to terminate just this process.
The \ac{RM} intends to terminate just the affected process.
}
%
\declareAttribute{PMIX_EVENT_ACTION_TIMEOUT}{"pmix.evtimeout"}{int}{
Expand Down Expand Up @@ -375,7 +378,7 @@ \subsection{Notification Function}
\adviceuserend

\advicermstart
On the server side, the notification function is used to inform the \ac{PMIx} server library's host of a detected event in the \ac{PMIx} server library. Events generated by \ac{PMIx} clients are communicated to the \ac{PMIx} server library, but will be relayed to the host via the \refapi{pmix_server_notify_event_fn_t} function pointer, if provided.
On the server side, the notification function, \refapi{pmix_server_notify_event_fn_t} (See \ref{chap:server:func_pointers}), is used to inform the \ac{PMIx} server library's host of a detected event in the \ac{PMIx} server library. Events generated by \ac{PMIx} clients are communicated to the \ac{PMIx} server library, but will be relayed to the host via its \refapi{pmix_server_notify_event_fn_t} function pointer, if provided.
\advicermend


Expand Down Expand Up @@ -473,7 +476,7 @@ \subsection{\code{PMIx_Notify_event}}
\returnend

\reqattrstart
The following attributes are required to be supported by all \ac{PMIx} libraries:
The following attributes are required to be supported by all \ac{PMIx} libraries and will be passed, when relevant, to all invoked handlers:

\pasteAttributeItem{PMIX_EVENT_NON_DEFAULT}
\pasteAttributeItem{PMIX_EVENT_CUSTOM_RANGE}
Expand Down Expand Up @@ -504,15 +507,17 @@ \subsection{\code{PMIx_Notify_event}}
%%%%
\descr

Report an event for notification via any registered event handler. This function can be called by any \ac{PMIx} process, including application processes, \ac{PMIx} servers, and \ac{SMS} elements. The \ac{PMIx} server calls this \ac{API} to report events it detected itself so that the host \ac{SMS} daemon distribute and handle them, and to pass events given to it by its host down to any attached client processes for processing. Examples might include notification of the failure of another process, detection of an impending node failure due to rising temperatures, or an intent to preempt the application. Events may be locally generated or come from anywhere in the system.
Report an event for notification via any registered event handler. This function can be called by any \ac{PMIx} process, including application processes, \ac{PMIx} servers, and \ac{SMS} elements.

The \ac{PMIx} server calls this \ac{API} to report events it detected itself so that the host \ac{SMS} daemon can distribute and handle them, and to pass events given to it by its host down to any attached client processes for processing. Examples might include notification of the failure of another process, detection of an impending node failure due to rising temperatures, or an intent to preempt the application. Events may be locally generated or come from anywhere in the system.

Host \ac{SMS} daemons call the \ac{API} to pass events down to its embedded \ac{PMIx} server both for transmittal to local client processes and for the host's own internal processing where the host has registered its own event handlers. The \ac{PMIx} server library is not allowed to echo any event given to it by its host via this \ac{API} back to the host through the \refapi{pmix_server_notify_event_fn_t} server module function. The host is required to deliver the event to all \ac{PMIx} servers where the targeted processes either are currently running, or (if they haven't started yet) might be running at some point in the future as the events are required to be cached by the \ac{PMIx} server library.

Client application processes can call this function to notify the \ac{SMS} and/or other application processes of an event it encountered. Note that processes are not constrained to report status values defined in the official \ac{PMIx} standard --- any integer value can be used. Thus, applications are free to define their own internal events and use the notification system for their own internal purposes.

\adviceuserstart
The callback function will be called upon completion of the
\code{notify_event} function's actions. At that time, any messages required for executing the operation (e.g., to send the notification to the local \ac{PMIx} server) will
\refapi{PMIx_Notify_event} function's actions. At that time, any messages required for executing the operation (e.g., to send the notification to the local \ac{PMIx} server) will
have been queued, but may not yet have been transmitted. The caller is required to maintain the input
data until the callback function has been executed --- the sole purpose of the callback function is to indicate when the input data is no longer required.
\adviceuserend
Expand Down Expand Up @@ -565,10 +570,16 @@ \subsubsection{Completion Callback Function Status Codes}
%
\declareconstitemvalue{PMIX_EVENT_ACTION_DEFERRED}{-333}
Event handler: Action deferred.
\end{constantdesc}
%
\declareconstitemvalue{PMIX_EVENT_ACTION_COMPLETE}{-334}
Event handler: Action complete.

The following status code may be returned by a handler to stop the handler chain. It will not appear in the \refarg{results} for any handler invocation since the handler that produces it is the last handler executed in the chain.

\begin{constantdesc}
%
\declareconstitemvalue{PMIX_EVENT_ACTION_COMPLETE}{-334}
Event handler: Action complete.
\end{constantdesc}
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2 changes: 2 additions & 0 deletions Chap_API_Server.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1940,6 +1940,8 @@ \subsection{\code{PMIx_server_delete_process_set}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Server Function Pointers}

\label{chap:server:func_pointers}

\ac{PMIx} utilizes a "function-shipping" approach to support for implementing the server-side of the protocol. This method allows \acp{RM} to implement the server without being burdened with \ac{PMIx} internal details. When a request is received from the client, the corresponding server function will be called with the information.

Any functions not supported by the \ac{RM} can be indicated by a \code{NULL} for the function pointer. \ac{PMIx} implementations are required to return a \refconst{PMIX_ERR_NOT_SUPPORTED} status to all calls to functions that require host environment support and are not backed by a corresponding server module entry. Host environments may, if they choose, include a function pointer for operations they have not yet implemented and simply return \refconst{PMIX_ERR_NOT_SUPPORTED}.
Expand Down

0 comments on commit 31ac6bf

Please sign in to comment.