-
Notifications
You must be signed in to change notification settings - Fork 0
/
survey-devel.tex
333 lines (282 loc) · 16.8 KB
/
survey-devel.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
\section{Software Development}
% \fixme{(authors: Brett, Peter)}
\subsection{Description}
\label{sec:swdevpartitions}
The tools supporting the full software development life-cycle can be
partitioned into these orthogonal categories.
\begin{description}
\item[Code Repositories] store a historical record of revisions to a
code base including information on when a change is made, the
identity of the developer and some note explaining the change.
Repositories may be organized to hold a single logical unit of
source code (ie, a \textit{source package}) or may include multiple
such units relying on some internal demarcation. They allow
diverging lines of development, merging these lines and placing
labels to identify special points in the history (ie, release tags).
\item[Package Build System] contains tools applied to the files of
\textit{source package} in order to transform them into some number
of resulting files (executable programs, libraries). Typically the
system executes some number of commands (compilers, linkers) while
applying some number of build parameters (debug/optimized
compilation, locating dependencies, activating code features). This
system may directly install the results of the build to some area or
in addition it may collect the build results into one or more
\textit{binary packages}.
\item[Release Configuration] contains tools or specifications for the
collection of information needed to build a cohesive suite of
packages. It includes the list of packages making up the suite,
their versions, any build parameters, file system layout policy,
source locations, any local patch files and the collection of
commands needed to exercise the \textit{package build system}.
\item[Release Build System] contains tools or processes (instructions)
that can apply a \textit{release configuration} to each
\textit{package build system} in the software suite. This process
typically iterates on the collection of packages in an order that
honors their inter-dependencies. As each package is built, the
\textit{release build system} assures it is done in a proper context
containing the build products of dependencies and ideally,
controlling for any files provided by the general operating system
or user environment. This system may directly install the results of
the build to some area and it may collect the build results into one
or more \textit{binary packages}.
\item[Package Installation System] contains tools that, if they are
produced, can download and unpack \textit{binary packages} into an
installation area. This system is typically tightly coupled to the
binary package format. It may rely on meta data internal or
external to the binary package file in order to properly resolve
dependencies, conflicts or perform pre- and post-installation
procedures. The system may require privileged access and a single
rooted file system tree or may be run as an unprivileged user and
allow for multiple and even interwoven file system trees.
\item[User Environment Management] contains tools that aggregate a
subset of installed software in such a way that the end user may
properly execute the programs it provides. This aggregation is
typically done through the setting of environment variables
interpreted by the shell, such as \texttt{PATH}. In other cases the
bulk of aggregation is done via the file system by copying or
linking files from some installation store into a more localized
area and then defining some minimal set of environment variables.
In the case where software is installed as system packages
environment management may not be required.
\item[Development Environment Management] contains tools to assist the
developer in modifying existing software or writing novel packages.
Such tools are not strictly required as a developer may use tools
from the above categories to produce a personal release. However,
in practice this makes the development cycle (modify-build-test
loop) unacceptably long. To reduce this time and effort, existing
release builds can be leveraged, installation steps can be minimized
or removed, and environment management can be such as to use the
build products in-place. Care is needed in designing such tools to
mitigate interference between individual developers while allowing
them to synchronize their development as needed.
\item[Continuous Integration] contains tools and methodologies for
developing and exercising the code in order to validate changes,
find and fix problems quickly, and vet releases.
\item[Issues Tracker] contains tools to manage reporting,
understanding and addressing problems with the software, requests
for new features, organizing and documenting releases.
\end{description}
The following sections give commentary on what aspects are successful
for providing general, cross-experiment benefit and what failings are
identified. Explicit examples and areas where improvement may be made are given.
\subsection{Follow Free Software}
The Free Software (FS) and the Open Source (OS) communities have a
large overlap with HEP in terms of how they develop and use software.
FS/OS has been very successful in achieving beneficial sharing of
software, of course, largely due to that being a primary goal of the
community. It is natural then for the HEP software community to try
to emulate FS/OS.
Of course, HEP does already benefit greatly from adopting many
libraries and tools from FS/OS. The community is relatively open with
its software development (in comparison to, for example, industry).
There are however some ways in which HEP community currently differs
from the FS/OS. Some are necessary and some are areas where
improvements can be made.
\begin{itemize}
\item Physics is the primary focus, not software. Of course this is
proper. But, software is often not considered as important as other
secondary aspects such as detector hardware design despite the fact
that detector data is essentially useless today without quality software. Even
in areas where software is the focus, often the ``hard core''
software issues are down-played or considered unimportant.
\item The use and development of HEP software is often tightly
intertwined. End users of software are often its developers.
Making formal releases is often seen as a hindrance or not
performed due to lack of familiarity or access to easily usable
release tools.
\item HEP software typically must be installed with no special
permissions (non-``root''), in non-system locations, and with
multiple versions of the software available on the same system.
User/developers will often need to maintain locally modified copies
of the software that override but otherwise rely on some centrally
shared installation.
\item Versions matter a lot until they don't. A single code commit
may radically change results and so upgrades must be done with care
and changes constantly validated. Old versions must be kept accessible
until new ones are vetted. They then become unimportant but must be
forever reproducible in case some issue is found in the future which
requires rerunning of the old version.
\item HEP software suites tend to be relatively large, often with the
majority consisting of software authored by HEP physicists. Their
design often requires an all-or-nothing adoption. Lack of careful
modular components with well defined interfaces lead to design
complexity and practical problems such as compilation time.
Dependencies must be carefully handled and tested when lower-layer
libraries are modified.
\end{itemize}
\subsection{Category Integration}
The categories described in section \ref{sec:swdevpartitions} present
some ideal partitioning. Real world tools often cover multiple
categories. When this integration is done well it can be beneficial.
Where it is not done well it can lead to lock-in, lack of portability,
increased maintenance costs and other pathologies.
The functions of Configuration Management Tools (CMT) spans most of these categories.
Its design is such that it provides beneficial integration with some
capability to select the categories in which to apply it. For
example, it provides a package build system but one which is flexible
enough to either directly execute build commands or to delegate to
another package build system. This allows building and use of
external packages to achieve symmetry with packages developed by the
experiment. The configuration system is flexible enough to tightly
control versions for releases or to relax dependency conditions
suitable for a development context. The same information used to
build packages is used to generate shell commands to configure
end-user environment.
CMT was initially used by LHC experiments but has successfully be
adopted to others outside of CERN (Daya Bay, Minerva, Fermi/GLAST, and
others). It is used across the three major computer platforms (Linux,
Mac OS X and Windows).
In contrast is the UPS/CET system from Fermilab currently used to
build the art framework and its applications. UPS itself shares some
passing familiarity to CMT although its implementation is such that
even its proponents do not typically use it fully as it was designed. Its
entire ability to build packages is largely avoided. Its other
primary purpose of managing user environment is often augmented with
custom shell scripts.
The CET portion adds a package build system based on CMake but with
hard wired entanglements with UPS. It tightly locks in to the source
code which versions of dependencies must be built against and the
mechanism to locate them. Developers commonly have their own effort
derailed if they attempt to incorporate work from others as any
intervening release forces their development base to become broken and
require reinitializing. Attempting to port the software (art and
LArSoft) entangled with this build system from the only supported
Linux distribution (Scientific Linux) to another (Debian) was found to
be effectively impossible in any reasonable time frame. This has led
to an effort by the LBNE collaboration to fully remove the UPS/CET
package build system from this software and replace it with one still
based on CMake but which follows standard forms and best practices.
It took far less effort to reimplement a build system than to adopt
the initial one. Effort is ongoing to incorporate these changes back
into the original software.
The astrophysics experiment LSST has developed a system, EUPS \cite{lssteups},
based on UPS, for code builds which allows local builds on an experiment
collaborators' laptop or server and which probes the users local machine
for already installed standard packages (such as python). This system
may be worth a look for smaller scale experiments \cite{lsstwiki}.
\subsection{Distributed Software Tools}
Network technology has lead to paradigm shifts in binary code
distribution (eg CVMFS) and in distributing data (eg XRootD). HEP
software development has always been very distributed and it is
important to continue to embrace this.
One successful embrace has been the move to \texttt{git} for managing source
code revisions. In comparison, any code development that is still
kept in Concurrent Versions System (CVS) or Subversion (SVN) is at a
relative disadvantage in terms of the ability to distribute
development effort and share its results.
Aggregating \texttt{git} repositories along with associated issue trackers, web
content (wikis) to provide a definitive, if only by convention, center
of development is also important. Some institutions provide these
aggregation services (Fermilab's Redmine) but the full benefit comes
when the software is exposed in a more global way such as through
online repository aggregators like GitHub or BitBucket.
Building software is an area that would benefit from a more
distributed approach. The traditional model is that the software
needed by the experiment is built from source by a site administrator
or an individual. In some cases, an institution will take on the job
of building software for multiple experiments such as is done for some
experiments centered at CERN and Fermilab. While this service is
helpful for users of the platforms supported by the institution, it
tends to lock out users who do not use the officially supported
computer platforms. These unsupported platforms are otherwise
suitable for use and are often more advanced that then officially
supported ones. Small incompatibilities build up in the code base
because they go undetected in the relative monoculture created by
limiting support to a small number of platforms.
Distributed software build and installation systems are becoming
established and should be evaluated for adoption. Examples include
the package management systems found in the Nix and Guix operating
systems. These allows one individual to build a package in such a way
that it may be used by anyone else. They also provide many innovative
methods for end-user package aggregation which leverage the file
system instead of polluting the user's environment variables.
Another example is Conda which provides a method to bundle up the
build configuration and a one-package unit of a release build system.
It also provides an end-user tool to install the packaged results. A
coupled project is Binstar which can be thought of as a mix between
GitHub and the Python Package Index (PyPI). It allows upload and
distribution of packages built by Conda for later end-user download
and installation.
HEP community software projects and individual experiments can make
use of either the Nix/Guix or Conda/Binstar approaches to provide
ready-to-use code binaries to any networked computer in a trusted
manner.
%
Sharing and coordinating the production of these packages would take
additional effort but this will be paid back by the reduction of so
much redundant effort that goes into building the same package by each
individual, experiment or project.
\subsection{Automate}
The majority of most HEP software suites are composed of four layers:
the experiment software on top is supported by general-use HEP
software. Below that is FS/OS packages which may be included in some
operating system distributions but in order to control versions and to
provide a uniform base for those OS distributions which do not provide
them, they are build from source. Finally, there is some lowest layer
provided by the OS. Each experiment draws each of these lines
differently and some may choose to blur them.
To produce proper and consistent release configurations and to track
them through time is challenging. Once created, in principle, a
system can then apply these configurations in a way that automates the
production of the release build. Without this automation the amount
of effort balloons. This is only partially mitigated by distributing
the build results (addressed above).
Some experiments have developed their own build automation based on
scripts. These help the collaborators but they are not generally
useful.
CERN developed LCGCMT which, in part, provides automated building of
``externals'' via the \verb|LCG_Builders| component. This system is
specifically tied to CMT and is particularly helpful if CMT is adopted
as a release build system. This mechanism has been adopted by groups
outside of CERN, specifically those that also adopted the Gaudi event
processing framework. It has been specifically adopted by other
experiments.
Growing out of the experience with custom experiment-specific
automation and LCGCMT, the Worch~\cite{worch} project was developed to provide
build ``orchestration''. This includes a release configuration method
and an automated release build tool. It is extensible to provide
support for the other software development tool categories. For
example, it has support for producing needed configuration files to
provide support for using Environment Modules as a method for end-user
environment management.
\subsection{Opportunity for improvement}
Some best practices in the area of software development tools are:
\begin{description}
\item[Leverage Free Software] Rely on Free Software / Open Source and
do not become shackled to proprietary software or systems.
\item[Portability] Do not limit development to specific platform or
institutional details.
\item[Automate] Produce builds and tests of software stacks in an
automated manner that is useful to both end-user/installers and
developers.
\end{description}
\noindent Some concrete work that would generally benefit software development efforts in HEP include:
\begin{itemize}
\item Form a cross-experiment group to determine requirements for
tools for build automation, software release management, binary
packaging (including their format), end-user and developer
environment management.
\item Form teams to design, identify or implement tools meeting these
requirements.
\item Assist experiments in the adoption of these tools.
\end{itemize}