-
Notifications
You must be signed in to change notification settings - Fork 22
/
App_Use_Cases.tex
592 lines (410 loc) · 35.5 KB
/
App_Use_Cases.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Appendix: Use Cases
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Use-Cases}
\label{app:use-cases}
The \ac{PMIx} standard provides many generic interfaces that can be composed into higher-level use cases in a variety of ways. While the specific interfaces and attributes are standardized, the use cases themselves are not (and should not) be standardized. Common use cases are included here as examples of how PMIx's generic interfaces \textit{might} be composed together for a higher-level purpose. The use cases are intended for both \ac{PMIx} interface users and library implementors. Whereby a better understanding of the general usage model within the community can help users picking up PMIx for the first and help implementors optimize their implementation for the common cases.
Each use case is structured to provide background information about the high-level use case as well as specific details about how the PMIx interfaces are used within the use case. Some use cases even provide code snippets. These code snippets are apart of larger code examples located within the standard's source code repository, and each complete code example is fully compilable and runnable. The related interfaces and attributes collected at the bottom of each use case are mainly for convenience and link to the full standardized definitions.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Business Card Exchange for Process-to-Process Wire-up}
\label{app:uc-business-card-exchange}
\subsection{Use Case Summary}
Multi-process communication libraries, such as MPI, need to establish communication channels between a set of those processes. In this scenario, each process needs to share connectivity information (a.k.a. Business Cards) with all other processes before communication channels can be established. This connectivity information may take the form of one or more unique strings that allow a different process to establish a communication channel with the originator. The runtime environment must provide a mechanism for the efficient exchange of this connectivity information. Additional information about the current state of the job (e.g., number of processes globally and locally) and of how the process was started (e.g., process binding) is also helpful.
Note: The Instant-On wire-up mechanism is a separate, related use case.
\subsection{Use Case Details}
Each process provides their business card to PMIx via one or more \refapi{PMIx_Put} operations to store the tuple of \code{\{UID, key, value\}}. The \code{UID} is the unique name for this process in the \ac{PMIx} universe (i.e., \code{namespace} and \code{rank}). The \code{key} is a unique key that other processes can reference generically (note that since the \code{UID} is also associated with the \code{key} there is no need to make the \code{key} uniquely named per process). The \code{value} is the string representation of the connectivity information.
Some business card information is meant for remote processes (e.g., TCP or InfiniBand addresses) while others are meant only for local processes (e.g., shared memory information). As such a \code{scope} should be associated with the \refapi{PMIx_Put} operation to differentiate this intention.
The \refapi{PMIx_Put} operations may be cached local to the process. Once all \refapi{PMIx_Put} operations have been called each process should call \refapi{PMIx_Commit} to push those values to the local PMIx server. Note that in a multi-library configuration each library may \refapi{PMIx_Put} then \refapi{PMIx_Commit} values - so there may be multiple \refapi{PMIx_Commit} calls before a Business Card Exchange is activated.
After calling \refapi{PMIx_Commit} a process can activate the Business Card Exchange collective operation by calling \refapi{PMIx_Fence}. The \refapi{PMIx_Fence} operation is collective over the set of processes specified in the argument set. That allows for the collective to span a subset of a namespace or multiple namespaces. After the completion of the \refapi{PMIx_Fence} operation, the data stored by other processes via \refapi{PMIx_Put} is available to the local process through a call to \refapi{PMIx_Get} which returns the key/value pairs necessary to establish the connection(s) with the other processes.
The \refapi{PMIx_Fence} operation has a "Synchronize Only" mode that works as a barrier operation. This is helpful if the communication library requires a synchronization before leaving initialization or starting finalization, for example.
The \refapi{PMIx_Fence} operation has a "Sparse" mode in addition to a "Full" mode for the data exchange. The "Full" mode will fully exchange all Business Card information with all other processes. This is helpful for tightly communicating applications. The "Sparse" mode will dynamically pull the connectivity information on-demand from inside of \refapi{PMIx_Get} (if it is not already available locally). This is helpful for sparsely communicating applications. Since which mode is best for an application cannot be inferred by the PMIx library the caller must specify which mode works best for their application. The \refapi{PMIx_Fence} operation has an option for the end user to specify which mode they desire for this operation.
Additional information about the current state of the job (e.g., number of processes globally and locally) and of how the process was started (e.g., process binding) is also helpful. This "job level" information is available immediately after \refapi{PMIx_Init} without the need for any explicit synchronization.
The number of processes globally in the namespace and this process's rank within that namespace is important to know before establishing the Business Card information to best allocate resources.
The number of processes local to the node and this process's local rank is important to know before establishing the Business Card information to help the caller determine the scope of the put operation. For example, to designate a leader to set up a shared memory segment of the proper size before putting that information into the locally scoped Business Card information.
The number of processes local to a remote node is also helpful to know before establishing the Business Card information. This information is useful to pre-establish local resources before that remote node starts to initiate a connection or to determine the number of connections that need to be advertised in the Business Card when it is sent out.
Note that some of the job level information may change over the course of the job in a dynamic application.
\littleheader{Related Interfaces}
{\large \refapi{PMIx_Put}}
\pasteSignature{PMIx_Put}
{\large \refapi{PMIx_Get}}
\pasteSignature{PMIx_Get}
{\large \refapi{PMIx_Commit}}
\pasteSignature{PMIx_Commit}
{\large \refapi{PMIx_Fence}}
\pasteSignature{PMIx_Fence}
{\large \refapi{PMIx_Init}}
\pasteSignature{PMIx_Init}
\littleheader{Related Attributes}
The following job level information is useful to have before establishing Business Card information:
\pasteAttributeItem{PMIX_NODE_LIST}
\pasteAttributeItem{PMIX_NUM_NODES}
\pasteAttributeItem{PMIX_NODEID}
\pasteAttributeItem{PMIX_JOB_SIZE}
\pasteAttributeItem{PMIX_PROC_MAP}
\pasteAttributeItem{PMIX_LOCAL_PEERS}
\pasteAttributeItem{PMIX_LOCAL_SIZE}
For each process this information is also useful (note that any one process may want to access this list of information about any other process in the system):
\pasteAttributeItem{PMIX_RANK}
\pasteAttributeItem{PMIX_LOCAL_RANK}
\pasteAttributeItem{PMIX_GLOBAL_RANK}
\pasteAttributeItem{PMIX_LOCALITY_STRING}
\pasteAttributeItem{PMIX_HOSTNAME}
There are other keys that are helpful to have before a synchronization point. This is not meant to be a comprehensive list.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Debugging}
\label{app:uc-debugging}
\subsection{Terminology}
\subsubsection{Tools vs Debuggers}
A \textit{tool} is a process designed to monitor, record, analyze, or control the execution of another process. Typically used for the purposes of profiling and debugging. A \textit{first-party tool} runs within the address space of the application process while a \textit{third-party tool} run within its own process. A \textit{debugger} is a third-party tool that inspects and controls an application process's execution using system-level debug APIs (e.g., \code{ptrace}).
\subsubsection{Parallel Launching Methods}
A \textit{starter} program is a program responsible for launching a parallel runtime, such as \ac{MPI}. \ac{PMIx} supports two primary methods for launching parallel applications under tools and debuggers: indirect and direct. In the indirect launching method (Section~\ref{chap:api_tools:indirect}, the tool is attached to the starter. In the direct launching method (Section~\ref{chap:api_tools:direct}, the tool takes the place of the starter.
\ac{PMIx} also supports attaching to already running programs via the \textit{Process Acquisition} interfaces (Section~\ref{subsubsec:process-acq}).
\subsubsection{Process Synchronization}
Process Synchronization is a technique tools use to start the processes of a parallel application such that the tools can still attach to the process early in its lifetime. Said another away, the tool must be able to start the application processes without them ``running away'' from the tool. In the case of \ac{MPI} (Version 3.1~\cite{mpi-3.1} or the MPI World Process in future versions), this means stopping the applications processes before they return from \code{MPI_Init} or \code{MPI_Init_thread}.
\subsubsection{Process Acquisition}\label{subsubsec:process-acq}
Process Acquisition is a technique tools use to locate all of the processes, local and remote, of a given parallel application. This typically boils down to collecting the following information for every process in the parallel application: the hostname or IP of the machine running the process, the executable name, and the process ID.
\subsection{Use Case Details}
\subsubsection{Direct-Launch Debugger Tool}
PMIx can support the tool itself using the PMIx spawn options to control the app’s startup, including directing the RM/application as to when to block and wait for tool attachment, or stipulating that an interceptor library be preloaded. However, this means that the user is restricted to whatever command line options the tool vendor has provided for operations such as process placement and binding, which places a significant burden on the tool vendor. An example might look like the following: \code{dbgr -n 3 ./myapp}.
Assuming it is supported, co-launch of debugger daemons in this use-case is supported by adding a \code{pmix_app_t} to the \refapi{PMIx_Spawn} command, indicating that the resulting processes are debugger daemons by setting the \refattr{PMIX_DEBUGGER_DAEMONS} attribute.
\begingroup
\begin{figure*}
\begin{center}
\includegraphics[width=\textwidth,height=\textheight,keepaspectratio]{figs/direct-launch}
\end{center}
\caption{Interaction diagram showing an example of the Direct Launch mechanism}
\label{fig:direct_launch}
\end{figure*}
\endgroup
\littleheader{Related Interfaces}
{\large \refapi{PMIx_tool_init}}
\pasteSignature{PMIx_tool_init}
{\large \refapi{PMIx_Register_event_handler}}
\pasteSignature{PMIx_Register_event_handler}
{\large \refapi{PMIx_Query_info}}
\pasteSignature{PMIx_Query_info}
{\large \refapi{PMIx_Spawn}}
\pasteSignature{PMIx_Spawn}
{\large \refapi{PMIx_Get}}
\pasteSignature{PMIx_Get}
{\large \refapi{PMIx_Notify_event}}
\pasteSignature{PMIx_Notify_event}
\littleheader{Related Attributes}
\pasteAttributeItem{PMIX_QUERY_SPAWN_SUPPORT}
\pasteAttributeItem{PMIX_QUERY_DEBUG_SUPPORT}
\pasteAttributeItem{PMIX_DEBUG_STOP_IN_INIT}
\pasteAttributeItem{PMIX_DEBUG_STOP_ON_EXEC}
\pasteAttributeItem{PMIX_DEBUG_DAEMONS_PER_PROC}
\pasteAttributeItem{PMIX_DEBUG_DAEMONS_PER_NODE}
\pasteAttributeItem{PMIX_COSPAWN_APP}
\pasteAttributeItem{PMIX_MAPBY}
\pasteAttributeItem{PMIX_FWD_STDOUT}
\pasteAttributeItem{PMIX_FWD_STDERR}
\pasteAttributeItem{PMIX_NOTIFY_COMPLETION}
\pasteAttributeItem{PMIX_SETUP_APP_ENVARS}
\pasteAttributeItem{PMIX_EVENT_AFFECTED_PROC}
\pasteAttributeItem{PMIX_DEBUGGER_DAEMONS}
\pasteAttributeItem{PMIX_DEBUG_TARGET}
\pasteAttributeItem{PMIX_DEBUG_STOP_IN_APP}
\pasteAttributeItem{PMIX_QUERY_LOCAL_PROC_TABLE}
\littleheader{Related Constants}
\refconst{PMIX_DEBUG_WAITING_FOR_NOTIFY} \\
\refconst{PMIX_DEBUGGER_RELEASE}
\subsubsection{Indirect-Launch Debugger Tool}
Executing a program under a tool using an intermediate launcher such as \code{mpiexec} can also be made possible. This requires some degree of coordination between the tool and the launcher. Ultimately, it is the launcher that is going to launch the application, and the tool must somehow inform the launcher (and the application) that this is being done in a debug session so that the application knows to ``block'' until the tool attaches to it.
In this operational mode, the user invokes a tool (typically on a non-compute, or ``head'', node) that in turn uses \code{mpiexec} to launch their application – a typical command line might look like the following: \code{dbgr -dbgoption mpiexec -n 32 ./myapp}.
\begingroup
\begin{figure*}
\begin{center}
\includegraphics[width=\textwidth,height=\textheight,keepaspectratio]{figs/indirect-launch}
\end{center}
\caption{Interaction diagram showing an example of the Indirect Launch mechanism}
\label{fig:indirect_launch}
\end{figure*}
\endgroup
\littleheader{Related Interfaces}
{\large \refapi{PMIx_tool_init}}
\pasteSignature{PMIx_tool_init}
{\large \refapi{PMIx_Register_event_handler}}
\pasteSignature{PMIx_Register_event_handler}
{\large \refapi{PMIx_Spawn}}
\pasteSignature{PMIx_Spawn}
{\large \refapi{PMIx_Notify_event}}
\pasteSignature{PMIx_Notify_event}
{\large \refapi{PMIx_tool_attach_to_server}}
\pasteSignature{PMIx_tool_attach_to_server}
{\large \refapi{PMIx_Query_info}}
\pasteSignature{PMIx_Query_info}
{\large \refapi{PMIx_Get}}
\pasteSignature{PMIx_Get}
\littleheader{Related Attributes}
\pasteAttributeItem{PMIX_LAUNCH_DIRECTIVES}
\pasteAttributeItem{PMIX_SPAWN_TOOL}
\pasteAttributeItem{PMIX_COSPAWN_APP}
\pasteAttributeItem{PMIX_FWD_STDOUT}
\pasteAttributeItem{PMIX_FWD_STDERR}
\pasteAttributeItem{PMIX_SETUP_APP_ENVARS}
\pasteAttributeItem{PMIX_DEBUG_STOP_IN_INIT}
\pasteAttributeItem{PMIX_DEBUG_STOP_ON_EXEC}
\pasteAttributeItem{PMIX_DEBUG_DAEMONS_PER_PROC}
\pasteAttributeItem{PMIX_DEBUG_DAEMONS_PER_NODE}
\pasteAttributeItem{PMIX_MAPBY}
\pasteAttributeItem{PMIX_QUERY_PROC_TABLE}
\pasteAttributeItem{PMIX_QUERY_LOCAL_PROC_TABLE}
\pasteAttributeItem{PMIX_DEBUGGER_DAEMONS}
\pasteAttributeItem{PMIX_NOTIFY_COMPLETION}
\pasteAttributeItem{PMIX_DEBUG_TARGET}
\pasteAttributeItem{PMIX_WAIT_FOR_CONNECTION}
\littleheader{Related Constants}
\refconst{PMIX_LAUNCHER_READY} \\
\refconst{PMIX_LAUNCH_COMPLETE} \\
\refconst{PMIX_DEBUG_WAITING_FOR_NOTIFY} \\
\refconst{PMIX_DEBUGGER_RELEASE} \\
\refenvar{PMIX_LAUNCHER_RNDZ_URI}
\subsubsection{Attaching to a Running Job}
PMIx supports attaching to an already running parallel job in two ways. In the first way, the main process of a tool calls \refapi{PMIx_Query_info} with the \refattr{PMIX_QUERY_PROC_TABLE} attribute. This returns an array of structs containing the information required for \hyperref[subsubsec:process-acq]{process acquisition}. This includes remote hostnames, executable names, and process IDs. In the second way, every tool daemon calls \refapi{PMIx_Query_info} with the \refattr{PMIX_QUERY_LOCAL_PROC_TABLE} attribute. This returns a similar array of structs but only for processes on the same node.
An example of this use-case may look like the following: \code{mpiexec -n~32~./myApp \&\& dbgr attach \$!}.
\begingroup
\begin{figure*}
\begin{center}
\includegraphics[width=\textwidth,height=\textheight,keepaspectratio]{figs/process-acquisition}
\end{center}
\caption{Interaction diagram showing an example of the attaching to a running job}
\label{fig:proc_acq}
\end{figure*}
\endgroup
{\large \refapi{PMIx_tool_init}}
\pasteSignature{PMIx_tool_init}
{\large \refapi{PMIx_Register_event_handler}}
\pasteSignature{PMIx_Register_event_handler}
{\large \refapi{PMIx_Query_info}}
\pasteSignature{PMIx_Query_info}
{\large \refapi{PMIx_Spawn}}
\pasteSignature{PMIx_Spawn}
\littleheader{Related Attributes}
\pasteAttributeItem{PMIX_QUERY_PROC_TABLE}
\pasteAttributeItem{PMIX_DEBUGGER_DAEMONS}
\pasteAttributeItem{PMIX_DEBUG_TARGET}
\pasteAttributeItem{PMIX_DEBUG_DAEMONS_PER_PROC}
\pasteAttributeItem{PMIX_DEBUG_DAEMONS_PER_NODE}
\pasteAttributeItem{PMIX_MAPBY}
\pasteAttributeItem{PMIX_FWD_STDOUT}
\pasteAttributeItem{PMIX_FWD_STDERR}
\pasteAttributeItem{PMIX_NOTIFY_COMPLETION}
\pasteAttributeItem{PMIX_REQUESTOR_IS_TOOL}
\pasteAttributeItem{PMIX_QUERY_NAMESPACES}
\subsubsection{Tool Interaction with RM}
Tools can benefit from a mechanism by which they may interact with a local PMIx server that has opted to accept such connections along with support for tool connections to system-level PMIx servers, and a logging feature. To add support for tool connections to a specified system-level, PMIx server environments could choose to launch a set of PMIx servers to support a given allocation - these servers will (if so instructed) provide a tool rendezvous point that is tagged with their pid and typically placed in an allocation-specific temporary directory to allow for possible multi-tenancy scenarios. Supporting such operations requires that a system-level PMIx connection be provided which is not associated with a specific user or allocation. A new key has been added to direct the PMIx server to expose a rendezvous point specifically for this purpose.
{\large \refapi{PMIx_Query_info_nb}}
\pasteSignature{PMIx_Query_info_nb}
{\large \refapi{PMIx_Register_event_handler}}
\pasteSignature{PMIx_Register_event_handler}
{\large \refapi{PMIx_Deregister_event_handler}}
\pasteSignature{PMIx_Deregister_event_handler}
{\large \refapi{PMIx_Notify_event}}
\pasteSignature{PMIx_Notify_event}
{\large \refapi{PMIx_server_init}}
\pasteSignature{PMIx_server_init}
\subsubsection{Environmental Parameter Directives for Applications and Launchers}
It is sometimes desirable or required that standard environmental variables (e.g., \code{PATH}, \code{LD_LIBRARY_PATH}, \code{LD_PRELOAD}) be modified prior to executing an application binary or a starter such as \code{mpiexec} - this is particularly true when tools/debuggers are used to start the application.
\littleheader{Related Interfaces}
{\large \refapi{PMIx_Spawn}}
\pasteSignature{PMIx_Spawn}
\littleheader{Related Structs}
\refstruct{pmix_envar_t}
\littleheader{Related Attributes}
\pasteAttributeItem{PMIX_SET_ENVAR}
\pasteAttributeItem{PMIX_ADD_ENVAR}
\pasteAttributeItem{PMIX_UNSET_ENVAR}
\pasteAttributeItem{PMIX_PREPEND_ENVAR}
\pasteAttributeItem{PMIX_APPEND_ENVAR}
Resource managers and launchers must scan for relevant directives, modifying environmental parameters as directed. Directives are to be processed in the order in which they were given, starting with job-level directives (applied to each app) followed by app-level directives.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Hybrid Applications}
\label{app:uc-hybrid-applications}
\subsection{Use Case Summary}
Hybrid applications (i.e., applications that utilize more than one programming model or runtime system, such as an application using MPI that also uses OpenMP or UPS) are growing in popularity, especially as processors with increasingly large numbers of cores and/or hardware threads proliferate. Unfortunately, the various corresponding runtime systems currently operate under the assumption that they alone control execution. This leads to conflicts in hybrid applications. Deadlock of parallel applications can occur when one runtime system prevents the other from making progress due to lack of coordination between them~\cite{2016:Hamidouche}. Sub-optimal performance can also occur due to uncoordinated division of hardware resources between the runtime systems implementing the different programming models or systems~\cite{ompix-moc,2018:Vallee}. This use-case offers potential solutions to this
problem by providing a pathway for parallel runtime systems to coordinate their actions.
\subsection{Use Case Details}
\subsubsection{Identifying Active Parallel Runtime Systems}
The current state-of-the-practice for concurrently used runtime systems in a single application to detect one another is via set environment variables. For example, some OpenMP implementations look for environment variables to indicate that an MPI library is active. Unfortunately, this technique is not completely reliable as environment variables change over time and with new software versions, and this detection is implementation specific. Also, the fact that an environment variable is present doesn't guarantee that a particular runtime system is in active use since Resource Managers routinely set environment variables "just in case" the application needs them. PMIx provides a reliable mechanism by which each library can determine that another runtime library is in operation.
When initializing PMIx, runtime libraries implementing a parallel programming model can register themselves, including their name, the library version, the version of the API they implement, and the threading model. This information is then cached locally and can then be read asynchronously by other runtime systems using PMIx's Event Notification system.
This initialization mechanism also allows runtime libraries to share knowledge of each other's resources and intended resource utilization. For example, if an OpenMP implementation knows which hardware threads an MPI library is using it could potentially avoid core and cache contention.
\littleheader{Code Example}
\pmixCodeImportC[]{sources/_autogen_/hybrid-prog-model.c_declare_model}
\littleheader{Related Interfaces}
{\large \refapi{PMIx_Init}}
\pasteSignature{PMIx_Init}
\littleheader{Related Attributes}
\pasteAttributeItem{PMIX_PROGRAMMING_MODEL}
\pasteAttributeItem{PMIX_MODEL_LIBRARY_NAME}
\pasteAttributeItem{PMIX_MODEL_LIBRARY_VERSION}
\pasteAttributeItem{PMIX_THREADING_MODEL}
\pasteAttributeItem{PMIX_MODEL_NUM_THREADS}
\pasteAttributeItem{PMIX_MODEL_NUM_CPUS}
\pasteAttributeItem{PMIX_MODEL_CPU_TYPE}
\pasteAttributeItem{PMIX_MODEL_PHASE_NAME}
\pasteAttributeItem{PMIX_MODEL_PHASE_TYPE}
\pasteAttributeItem{PMIX_MODEL_AFFINITY_POLICY}
\subsubsection{Coordinating at Runtime}
The PMIx Event Notification system provides a mechanism by which the resource manager can communicate system events to applications, thus providing applications with an opportunity to generate an appropriate response. Hybrid applications can leverage these events for cross-library coordination.
Runtime libraries can access the information provided by other runtime libraries during their initialization using the event notification system. In this case, runtime libraries should register a callback for the \refconst{PMIX_MODEL_DECLARED} event.
Applications, runtime libraries, and resource managers can also use the PMIx event notification system to communicate dynamic information, such as entering a new application phase (\refattr{PMIX_MODEL_PHASE_NAME}) or a change in resources used (\refconst{PMIX_MODEL_RESOURCES}). This dynamic information can be broadcast using the \refapi{PMIx_Notify_event} function. Runtime libraries can register callback functions to run when these events occur using \refapi{PMIx_Register_event_handler}.
\littleheader{Code Example}
Registering a callback to run when another runtime library initializes:
\pmixCodeImportC[]{sources/_autogen_/hybrid-prog-model.c_declare_model_cb}
Notifying an event:
\pmixCodeImportC[]{sources/_autogen_/hybrid-prog-model.c_notify_event}
\littleheader{Related Interfaces}
{\large \refapi{PMIx_Notify_event}}
\pasteSignature{PMIx_Notify_event}
{\large \refapi{PMIx_Register_event_handler}}
\pasteSignature{PMIx_Register_event_handler}
{\large \refapi{pmix_event_notification_cbfunc_fn_t}}
\pasteSignature{pmix_event_notification_cbfunc_fn_t}
\littleheader{Related Constants}
\refconst{PMIX_MODEL_DECLARED} \\
\refconst{PMIX_MODEL_RESOURCES} \\
\refconst{PMIX_OPENMP_PARALLEL_ENTERED} \\
\refconst{PMIX_OPENMP_PARALLEL_EXITED} \\
\refconst{PMIX_EVENT_ACTION_COMPLETE}
\subsubsection{Coordinating at Runtime with Multiple Event Handlers}
Coordinating with a threading library such as an OpenMP runtime library creates the need for separate event handlers for threads of the same process. For example in an MPI+OpenMP hybrid application, the MPI main thread and the OpenMP primary thread may both want to be notified anytime an OpenMP thread starts executing in a parallel region. This requires support for multiple threads to potentially register different event handlers against the same status code.
Multiple event handlers registered against the same event are processed in a chain-like manner based on the order in which they were registered, as modified by any directives. Registrations against specific event codes are processed first, followed by registrations against multiple event codes and then any default registrations. At each point in the chain, an event handler is called by the PMIx progress thread and given a function to call when that handler has completed its operation. The handler callback notifies PMIx that the handler is done, returning a status code to indicate the result of its work. The results are appended to the array of prior results, with the returned values combined into an array within a single \refstruct{pmix_info_t} as follows:
\begin{itemize}
\item \texttt{array[0]}: the event handler name provided at registration (may be an empty field if a string name was not given) will be in the key, with the \refstruct{pmix_status_t} value returned by the handler
\item \texttt{array[*]}: the array of results returned by the handler, if any.
\end{itemize}
The current PMIx standard does not actually specify a default ordering for event handlers as they are being registered. However, it does include an inherent ordering for invocation. Specifically, PMIx stipulates that handlers be called in the following categorical order:
\begin{itemize}
\item single status event handlers - handlers that were registered against a single specific status.
\item multi status event handlers - those registered against more than one specific status.
\item default event handlers - those registered against no specific status.
\end{itemize}
\littleheader{Code Example}
From the OpenMP primary thread:
\pmixCodeImportC[]{sources/_autogen_/hybrid-prog-model.c_omp_thread}
From the MPI process:
\pmixCodeImportC[]{sources/_autogen_/hybrid-prog-model.c_mpi_process}
\littleheader{Related Interfaces}
{\large \refapi{PMIx_Register_event_handler}}
\pasteSignature{PMIx_Register_event_handler}
{\large \refapi{pmix_event_notification_cbfunc_fn_t}}
\pasteSignature{pmix_event_notification_cbfunc_fn_t}
\littleheader{Related Attributes}
\pasteAttributeItem{PMIX_EVENT_HDLR_NAME}
\pasteAttributeItem{PMIX_EVENT_HDLR_FIRST}
\pasteAttributeItem{PMIX_EVENT_HDLR_LAST}
\pasteAttributeItem{PMIX_EVENT_HDLR_FIRST_IN_CATEGORY}
\pasteAttributeItem{PMIX_EVENT_HDLR_LAST_IN_CATEGORY}
\pasteAttributeItem{PMIX_EVENT_HDLR_BEFORE}
\pasteAttributeItem{PMIX_EVENT_HDLR_AFTER}
\pasteAttributeItem{PMIX_EVENT_HDLR_APPEND}
\littleheader{Related Constants}
\refconst{PMIX_EVENT_NO_ACTION_TAKEN} \\
\refconst{PMIX_EVENT_PARTIAL_ACTION_TAKEN} \\
\refconst{PMIX_EVENT_ACTION_DEFERRED} \\
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{MPI Sessions}
\label{app:uc-MPI-sessions}
\subsection{Use Case Summary}
MPI Sessions addresses a number of the limitations of the current MPI programming model. Among the immediate problems MPI Sessions is intended to address are the following:
\begin{itemize}
\item MPI cannot be initialized within an MPI process from different application components without a priori knowledge or coordination,
\item MPI cannot be initialized more than once, and MPI cannot be reinitialized after MPI finalize has been called.
\item With MPI Sessions, an application no longer needs to explicitly call \code{MPI_Init} to make use of MPI, but rather can use a Session to only initialize MPI resources for specific communication needs.
\item Unless the MPI process explicitly calls MPI_Init, there is also no explicit \code{MPI_COMM_WORLD} communicator. Sessions can be created and destroyed multiple times in an MPI process.
\end{itemize}
\subsection{Use Case Details}
\begingroup
\begin{figure*}
\begin{center}
\includegraphics[width=.5\textwidth,keepaspectratio]{figs/mpi-sessions1}
\end{center}
\caption{MPI Communicator from MPI Session Handle using PMIx}
\label{fig:mpi_s1}
\end{figure*}
\endgroup
A PMIx Process Set (PSET) is a user-provided or host environment assigned
label associated with a given set of application processes. Processes can
belong to multiple process sets at a time. Definition of a PMIx
process set typically occurs at time of application execution - e.g., on a
command line: \code{prun -n 4 --pset ocean myoceanapp : -n 3 --pset ice myiceapp}
PMIx PSETs are used for query functions (\code{MPI_SESSION_GET_NUM_PSETS}, \code{MPI_SESSION_GET_NTH_PSET)} and to create \code{MPI_GROUP} from a process set name.
In OpenMPI's MPI Sessions prototype, PMIx groups are used during creation of \code{MPI_COMM} from an \code{MPI_GROUP}. The PMIx group constructor returns a 64-bit PMIx Group Context Identifier (PGCID) that is guaranteed to be unique for the duration of an allocation (in the case of a batch managed environment). This PGCID could be used as a direct replacement for the existing unique identifiers for communicators in MPI (E.g. Communicator Identifiers (CIDs) in Open MPI), but may have performance implications.
There is an important distinction between process sets and process groups. The process set identifiers are set by the host environment and currently there are no PMIx APIs provided by which an application can change a process set membership. In contrast, PMIx process groups can only be defined dynamically by the application.
\littleheader{Related Interfaces}
{\large \refapi{PMIx_Get}}
\pasteSignature{PMIx_Get}
{\large \refapi{PMIx_Group_construct}}
\pasteSignature{PMIx_Group_construct}
\littleheader{Related Attributes}
\pasteAttributeItem{PMIX_PSET_NAMES}
\pasteAttributeItem{PMIX_QUERY_NUM_GROUPS}
\pasteAttributeItem{PMIX_QUERY_GROUP_NAMES}
\pasteAttributeItem{PMIX_QUERY_GROUP_MEMBERSHIP}
\littleheader{Related Constants}
\refconst{PMIX_SUCCESS}\\
\refconst{PMIX_ERR_NOT_SUPPORTED}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Cross-Version Compatibility}
\label{app:uc-cross-version}
\subsection{Use Case Summary}
The \ac{PMIx} interface serves as a conduit between clients (e.g., \ac{MPI} libraries), tools (e.g., debuggers), and servers (e.g., \acp{RM}).
As such, it is probable that a process operating in one of these roles (e.g., as a client or tool) is running a different version of the same \ac{PMIx} implementation than the process with which it is communicating that is operating in a different role (e.g., as a server).
For processes running in containers cross-version compatibility is especially important because the container image and the system software levels will naturally evolve and drift apart.
As such, there is a need for \ac{PMIx} implementations to provide cross-version compatibility.
The responsibility for providing cross-version compatibility is a feature of a specific \ac{PMIx} implementation and not necessarily of the \ac{PMIx} standard.
The \ac{PMIx} standard must strive to enable, and never limit, both the cross-version compatibility in any given \ac{PMIx} implementation, and the ability for a \ac{PMIx} consumer to adapt to cross-version differences in capabilities.
This use case is focused on cross-version compatibility between different versions of the same \ac{PMIx} implementation and not between different \ac{PMIx} implementations.
Cross-version compatibility responsibilities are not restricted to \ac{PMIx}, but a general issue for any library that coordinates across multiple processes. This includes, but not limited to, client/server libraries, and libraries with a user-space and kernel-space component (e.g., high-performance interconnect libraries).
\subsection{Use Case Details}
There are three scenarios that a \ac{PMIx} implementation and a \ac{PMIx} consumer must consider.
These scenarios use a PMIx Server and a PMIx Client for clarity, though the scenarios also apply to PMIx Tools.
\begin{enumerate}
\item \textbf{PMIx Server version matches PMIx Client version}:
No cross-version considerations are necessary since they are running the same version.
\item \textbf{PMIx Server version is older than PMIx Client version}:
The implementation must negotiate capabilities during the initial handshake.\\
This scenario is common if the (possibly containerized) PMIx client application is being run on an established system that does not update as frequently as the application requires.
Thus the PMIx Server in the \ac{RM} is locked to an older version of that \ac{PMIx} implementation.
\item \textbf{PMIx Server version is newer than PMIx Client version}:
The implementation must negotiate capabilities during the initial handshake.\\
This scenario is common if the (possibly containerized) PMIx client application is being run after a system software upgrade on the system.
Thus the PMIx Server in the \ac{RM} has been upgraded to a newer version of that \ac{PMIx} implementation and the client is still linked against the older version.
\end{enumerate}
When the two \ac{PMIx}-enabled processes first connect to each other they need to first check the version of the library that they are each running.
This handshake often occurs during initialization (though it could occur on a per-operation basis depending on the specific \ac{PMIx} implementation), for example during the following operations:
\begin{itemize}
\item PMIx Clients: \refapi{PMIx_Init}
\item PMIx Tools: \refapi{PMIx_tool_init}, \refapi{PMIx_tool_attach_to_server}
\item PMIx Servers: \refapi{PMIx_server_init}, \refapi{pmix_server_client_connected2_fn_t}, \refapi{pmix_server_tool_connection_fn_t}
\end{itemize}
Commonly this cross-version handshake occurs completely transparently to the consumers of the \ac{PMIx} interface since it happens inside a specific \ac{PMIx} implementation of these interfaces.
However, during the negotiation, some features available in one version might not be available in the other.
The consumer of the \ac{PMIx} interface should always be prepared to receive the \refconst{PMIX_ERR_NOT_SUPPORTED} error code from a \ac{PMIx} interface call that the other side either does not support or is not available in the version of the library with which they are linked.
After connecting to another \ac{PMIx} entity, the consumer of the \ac{PMIx} interface can use the \refapi{PMIx_Query_info} API to determine supported functionality and adapt accordingly.
\littleheader{Related Interfaces}
{\large \refapi{PMIx_Init}}
\pasteSignature{PMIx_Init}
{\large \refapi{PMIx_tool_init}}
\pasteSignature{PMIx_tool_init}
{\large \refapi{PMIx_tool_attach_to_server}}
\pasteSignature{PMIx_tool_attach_to_server}
{\large \refapi{PMIx_server_init}}
\pasteSignature{PMIx_server_init}
{\large \refapi{pmix_server_client_connected2_fn_t}}
\pasteSignature{pmix_server_client_connected2_fn_t}
{\large \refapi{pmix_server_tool_connection_fn_t}}
\pasteSignature{pmix_server_tool_connection_fn_t}
{\large \refapi{PMIx_Query_info}}
\pasteSignature{PMIx_Query_info}
\littleheader{Related Constants}
\refconst{PMIX_SUCCESS}\\
\refconst{PMIX_ERR_NOT_SUPPORTED}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%