forked from Sandia-OpenSHMEM/SOS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
289 lines (233 loc) · 13.7 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
Sandia OpenSHMEM
----------------
* About
Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over
Portals 4.0, the Open Fabrics Interface (OFI), and XPMEM.
Please refer to the "tests-sos" repository (https://github.com/openshmem-org/tests-sos)
to download only the unit tests and the performance test suite that are
included with Sandia OpenSHMEM.
* Building
The Sandia OpenSHMEM implementation utilizes the GNU Autoconf/Automake/Libtool
tools to generate a configure script. If the `configure` file is not present
(e.g. after downloading the repository for the first time), generate it
by running:
$ ./autogen.sh
Once the configure file exists, run:
$ ./configure <options>
$ make
$ make check
$ make install
The "make check" step is not strictly necessary, but is a good idea. Make
check utilizes the TEST_RUNNER and NPROCS make variables, which can be used to
override defaults, e.g. "make check NPROCS=4" or "make check
TEST_RUNNER='mpiexec -n 2 -ppn 1 -hosts compute1,compute2'".
Sandia OpenSHMEM must be configured to use either the Portals 4 or OFI network
transport, but not both. It can optionally be configured to use XPMEM or CMA
to optimize communication between PEs within the same shared memory domain.
Options to configure include:
--prefix=<DIR> Install implementation in <DIR>, default: /usr/local
--with-portals4=<DIR> Find the Portals 4 library in <DIR>
--with-ofi=<DIR> Find the libfabric library in <DIR>
--with-xpmem=<DIR> Find the XPMEM library in <DIR>
--with-cma Use cross-memory attach for on-node communication
--with-pmi=DIR Location of PMI installation. Configure will
automatically look for the PMI runtime provided by
the Portals 4 reference implementation
--enable-pmi-simple Include support for interfacing with a PMI 1.0
launcher. The launcher must be provided by a
separate package, such as MPICH, Hydra, or SLURM.
--enable-error-checking Enable error checking in SHMEM calls. This will
increase the overhead of communication operations.
--enable-hard-polling When using only the network transport, the
implementation will use counting events to
block the implementation when waiting for
local memory changes. On some implementations,
enabling hard polling may increase target side
message rate.
--enable-remote-virtual-addressing
Enable optimizations assuming the symmetric heap is
always symmetric with regards to virtual address.
This may cause applications to abort during
shmem_init() if such a symmetric heap can not be
created, but will reduce the instruction count for
some operations. This optimization also requires
that the Portals 4 implementation support
BIND_INACCESSIBLE on LEs. This optimization will
reduce the overhead of communication calls.
--disable-fortran Disable the Fortran bindings. This may be useful
if the machine has a Fortran compiler which does
not support ISO_C_BINDING.
--enable-nonblocking-fence
By default, shmem_fence() is equivalent to
shmem_quiet(), which can be a lengthy
operation. Enabling this feature results in
the ordering point being moved from the
shmem_fence() to the next put-like call,
which can help improve overlap in some
cases.
--enable-total-data-ordering=<yes|no|check>
If a network supports total data ordering
(that is, ordering guarantees to two
different addresses on the same target
node), this option can remove the
shmem_quiet() from shmem_fence() calls when
sending short messages. The option does,
however, force ordering requirements on the
network, so experimentation may be necessary
to determine the best configuration. Yes
means always assume total data ordering is
available and abort a job if that's not the
case. No means never use total data
ordering optimizations. Check will result
in slightly higher overhead than "yes", but
will provide a fallback if the network
doesn't provide total data ordering.
There are many other options to configure to influence performance and
behavior. See 'configure --help' for documentation on available
options.
* SHMEM Runtime Support
Environment variables:
SHMEM_VERSION: if defined, print SHMEM version during start_pes().
SHMEM_INFO: if defined, print (stdout) SHMEM environment variables.
SHMEM_SYMMETRIC_SIZE (default: 64 MiB)
The allocated size of the symmetric heap which shmalloc() and shfree()
operates on. The size value can be scaled with a suffix of
'K' for kilobytes (B * 1024),
'M' for Megabytes (KiB * 1024)
'G' for Gigabytes (MiB * 1024)
SHMEM_BOUNCE_SIZE (default: 2 KiB)
The maximum size of a bounce buffer for put messages.
Messages greater than the immediate send value for the
underlying network but greater than this threshold will be
copied into a bounce buffer and then sent.
SHMEM_MAX_BOUNCE_BUFFERS (default: 128)
The maximum number of bounce buffers that can be created per context.
SHMEM_COLL_CROSSOVER (default: 4)
For num_pes < SHMEM_COLL_CROSSOVER, collective algorithms are
serial instead of tree based.
SHMEM_COLL_SIZE_CROSSOVER (default: 16kiB)
For size < SHMEM_COLL_SIZE_CROSSOVER, collective algorithms are
optimized for latency, rather than bandwidth.
SHMEM_COLL_RADIX (default: 4)
Controls the width of the n-ary tree for collectives, such that each
node will fanout-send to a max of approximately SHMEM_COLL_RADIX
SHMEM_SYMMETRIC_HEAP_USE_MALLOC (default: 0)
If set to a non-zero integer, will use malloc() instead of
mmap() to allocate the symmetric heap. This option may result in
incorrect behavior when remote virtual addressing is enabled.
SHMEM_BARRIER_ALGORITHM (default: auto)
Algorithm to use for barriers. Default is to auto-select (which
may result in different algorithms being used for different
PE sets). Options are: auto, linear, tree, dissem.
SHMEM_BCAST_ALGORITHM (default: auto)
Algorithm to use for broadcasts. Default is to auto-select (which
may result in different algorithms being used for different
PE sets). Options are: auto, linear, tree.
SHMEM_REDUCE_ALGORITHM (default: auto)
Algorithm to use for reductions. Default is to auto-select (which
may result in different algorithms being used for different
PE sets). Options are: auto, linear, tree, recdbl, ring.
SHMEM_COLLECT_ALGORITHM (default: auto)
Algorithm to use for allgathers. Default is to auto-select (which
may result in different algorithms being used for different
PE sets). Options are: auto, linear.
SHMEM_FCOLLECT_ALGORITHM (default: auto)
Algorithm to use for allgathers with fixed contribution amounts.
Default is to auto-select (which may result in different
algorithms being used for different PE sets).
Options are: auto, linear, ring, recdbl. Note that recursive
doubling (recdbl) will fall back to ring if the PE set is not a
power of two in size.
SHMEM_BARRIERS_FLUSH (default: off)
If defined, standard output (stdout) and error (stderr) streams
will be flushed at the beginning of each barrier operation.
SHMEM_CMA_PUT_MAX (default: 8192)
'--with-cma', shmem put lengths <= CMA_PUT_MAX use process_vm_writev();
otherwise use Portals4 transport put.
SHMEM_CMA_GET_MAX (default: 16384)
'--with-cma', shmem get lengths <= CMA_GET_MAX use process_vm_readv();
otherwise use Portals4 transport get.
SHMEM_SYMMETRIC_HEAP_USE_HUGE_PAGES (default: off)
If defined, large pages will be used to back the symmetric heap. This
feature is only available on Linux.
SHMEM_SYMMETRIC_HEAP_PAGE_SIZE (default: 2MB)
Used to specify a large page size when using large pages to back the
symmetric heap. Ignored if SHMEM_SYMMETRIC_HEAP_USE_HUGE_PAGES is not
set. Refer to SHMEM_SYMMETRIC_SIZE for input syntax.
SHMEM_DISABLE_ASLR_CHECK (default: on)
Disable runtime checks for address space layout randomization (ASLR).
OFI Transport Environment variables:
SHMEM_OFI_PROVIDER (default: auto)
The name of the provider that should be used by the OFI transport.
Shell-style wildcards, including * and ?, are allowed. The fi_info
utility included with libfabric can be used for assistance with
identifying the desired provider.
SHMEM_OFI_FABRIC (default: auto)
The name of the fabric that should be used by the OFI transport.
Shell-style wildcards, including * and ?, are allowed. The fi_info
utility included with libfabric can be used for assistance with
identifying the desired fabric.
SHMEM_OFI_DOMAIN (default: auto)
The name of the fabric domain that should be used by the OFI transport.
Shell-style wildcards, including * and ?, are allowed. The fi_info
utility included with libfabric can be used for assistance with
identifying the desired fabric domain.
SHMEM_OFI_ATOMIC_CHECKS_WARN (default: off)
If defined, OFI will not abort if fabric provider doesn't support every
data type x op combination, instead it will print a warning.
SHMEM_OFI_TX_POLL_LIMIT (default: 0)
Sets the maximum number of iterations for the transmit polling loop
(for put/quiet operations). Setting this to -1 enables continuous
completion polling (i.e. there is no polling limit). The default
behavior is to call fi_cntr_wait without polling.
SHMEM_OFI_RX_POLL_LIMIT (default: 0)
Sets the maximum number of iterations for the receive polling loop (for
get/wait operations). Setting this to -1 enables continuous completion
polling (i.e. there is no polling limit). The default behavior is to
call fi_cntr_wait without polling.
SHMEM_OFI_STX_MAX (default: 1)
Sets the maximum number of sharable transmit contexts (STXs) per PE.
STXs are the underlying transmit resources that are allocated to
OpenSHMEM contexts and they are allocated using the algorithm specified
by the SHMEM_OFI_STX_ALLOCATOR parameter.
SHMEM_OFI_STX_ALLOCATOR (default: round-robin)
Algorithm for allocating STX resources to OpenSHMEM contexts. In
particular, the algorithm determines how resources are shared by
contexts once all STXs have been allocated. Options are: round-robin,
random.
SHMEM_OFI_STX_THRESHOLD (default: 1)
Number of contexts that must be allocated to all shared STXs before
another shared STX can be allocated. This threshold can be increased
to reduce the number of shared STXs and increase the number of STXs
available for private use (i.e., with contexts that enable the
SHMEM_CTX_PRIVATE option).
SHMEM_OFI_STX_DISABLE_PRIVATE (default: off)
Disable STX privatization. Enabling this may improve load balance
across transmit resources, especially in scenarios where the number of
contexts exceeds the number of STXs.
SHMEM_OFI_STX_AUTO (default: off)
Automatically determine an appropriate value for the number of STXs per
compute node, and evenly partition them across PEs on the same node. A
compute node is determined by its unique hostname, and the number of
STXs available on a compute node is provided by the libfabric library.
SHMEM_OFI_DISABLE_MULTIRAIL (default: off)
Disable multirail functionality. Enabling this will restrict all
communications to occur over a single NIC per system.
Team Environment variables:
SHMEM_TEAMS_MAX (default: 10)
Sets the maximum number of available teams per PE, including the
predefined teams. The maximum supported value is 64. The value must
be the same across all PEs in SHMEM_TEAM_WORLD.
SHMEM_TEAM_SHARED_ONLY_SELF (default: off)
If defined, the predefined team, SHMEM_TEAM_SHARED, will only include
the self PE.
Debugging Environment variables:
SHMEM_DEBUG (default: off)
If defined enables debugging messages from OpenSHMEM runtime.
SHMEM_TRAP_ON_ABORT (default: off)
If defined, generate a trap when aborting an OpenSHMEM program. This
can be used to interface with a debugger or generate core files.
SHMEM_BACKTRACE (default: <empty>)
Can be used to choose the backtracing mechanism. Default value is NULL
for which no backtrace information is provided upon failure. User can set
this with any one of these available options: execinfo, gdb, auto.