-
Notifications
You must be signed in to change notification settings - Fork 0
/
survey-db.tex
79 lines (41 loc) · 8.71 KB
/
survey-db.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
\section{Conditions Databases}
% \fixme{(authors: Simon, Brian, Peter)}
\subsection{Description}
Every HEP experiment has some form of ``conditions database''. The purpose of such as database is to capture and store any information that is needed in order to interpret or simulate events taken by the DAQ system. The underlying principle behind such a database is that the ``conditions'' at the time an event is acquired vary significantly slower that that the quantities read out by the DAQ in the event itself. The period over which these condition quantities can change range from seconds to the lifetime of the experiment.
In implementing a conditions database, an experiment is providing a mechanism by which to associate an event to a set of conditions without having to save a complete copy of those conditions with every event. A secondary feature is that the event-to-conditions association can normally be configured to select a particular version of the conditions as knowledge about the conditions can change over time as they are better understood.
\subsection{Basic Concepts}
It turns out that the basic concepts of a conditions database do not vary between experiments. They all have the same issues to solve. Questions of scale, distribution and so forth can depend on the size and complexity of the data model used for quantities within the database, but these aspect are secondary and are addressed by the implementation. The resulting software for experiment differ more in the choices of technologies used rather than any conceptual foundation.
\subsubsection{Data Model}
The Data Model of a conditions database defines how information is grouped in atomic elements in that data base and how those atomic elements are structure so that clients can recover the necessary quantity. This is the most experiment specific concept as it is directly related to the object model used in the analysis and simulation codes. However the division of condition quantities into the atomic elements is normally based on two criteria.
\begin{itemize}
\item The period over which a quantity varies, for example geometry may be updated once a year, while a detectors calibration may be measured once a week.
\item The logical cohesiveness of the quantities, for example the calibrations for one detector will be separate from those of another detector even if they are updated at the same frequency.
\end{itemize}
\subsubsection{Interval of Validity}
The standard way of matching a conditions element to an event is by using a timestamp related to the event's acquisition. Given this time the conditions database is searched for the instance of the element that was valid at that time. (What to do when multiple instances are valid for a given time is dealt with by versioning, see section~\ref{conditions-versioning}.). This therefore requires each entry in the conditions database to have an interval of validity stating the beginning and end times, with respect to the events timestamp, for which it should be considered as the value for its quantity.
As analysis often proceed sequential with respect to events, most implementations of condition database improve their efficiency by caching the `current' instance of a quantity once it has been read from the database until a request is made for a time outside its interval of validity. At this point the instance appropriate to the new time will be read in, along with its interval of validity.
\subsubsection{Versioning}
\label{conditions-versioning}
During the lifetime of an experiment a database will accumulate more than one instance of a conditions element that are valid for a given time. There two most obvious causes of this are the following.
\begin{itemize}
\item A conditions element is valid from a given time to the end-of-time in order to make sure there is always a valid instance of that element. At a later time during the experiment a new value for the element is measured and this is now entered into the database with its interval of validity starting later than the original instance but, as it in now the most appropriate from there on out, its validity runs until the end-of-time as well.
\item A conditions element may consist of a value derived from various measurements. In principle this can be considered a `cached' result of the derivation however it is treated as a first class element in the conditions database. At some point later, a better way of deriving the value is developed and this new value is placed in the database with the same interval of validity as the orignal one.
\end{itemize}
In both cases there need to be a mechanism for arbitrating which instances are used. This arbitration is managed by assigning versions to each instance. The choice of which version to used depends on the purpose of job that is being executed. If the purpose of the job is to use the `best' values then the `latest' version is used, but if the purpose of the job is to recreate the results of a previous job or to provide a know execution environment then it must be possible to define a specific version to be used by a jobs.
In order for the above versioning to work there must be some monotonic ordering of the versions. Typically this is done by the `insertion' date which is the logical date when the version was added to the database. It should be noted here that this date does not always reflect the actual date the version was inserted as that may not create the correct ordering of versions.
\subsection{Examples}
The following is the subset of the implementations of a conditions database pattern already done by HEP experiments.
\subsubsection{DBI, Minos}
This is a C++ binding that is decoupled for its surrounding framework, a feature that allowed it be adopted by the Dayabay Experiment for use in its Gaudi customization. It has a very thin data model with a conditions item being a single row in an appropriate SQL table.
\subsubsection{IOVSvc, Atlas}
The Atlas Interval of Validity Service, IOVSvc, is tightly bound to its Athena framework (their customization of the Gaudi framework). It acts by registering a proxy that will be filled by the service rather than direct calls to the service. It all has a feature where a callback can be registered that is invoked by the service whenever the currently value conditions item is no longer valid.
\subsubsection{CDB, BaBar}
The BaBar conditions database is notable in that during its lifetime it went through a migration for an ObjectivityDB back end to on using RootDB. This means that it can be used as an example of migrating conditions database implementations as suggested in the next section.
\subsection{Opportunity for improvement}
Given that the challenge of a condition database, to match data to an events timestamp, is universal and not specified to any style of experiment, and given that numerous solutions to this challenge exist, there is little point in creating new ones. The obvious opportunities for improvement are ones that make the existing solutions available for use by other experiment in order to avoid replication (again). To this end the following approaches are recommended:
\begin{itemize}
\item A detail survey of the interface of existing solutions should be made. The result of this survey should the definition of a ``standard'' API for conditions databases that is decoupled for any particular framework. This standard would be a superset of the features found in existing implementations so that all features of a given implementation can be accessed via this API. Suitable language bindings, such as C++ and python, so be provided as part of the standard.
\item Given the standard API and language bindings provided by the previous item, maintainers of conditions database implementations should first be encouraged, where possible, to develop the necessary code to provide the API as part of their implementation. They should then be encouraged to extent their own implementation to cover as much of the API as it is possible for their technology to support.
\item On the `consumer' side of the API, existing framework maintainers should be encouraged to adapt their framework so that is can use the standard API to resolved conditions database access. For frameworks and conditions databases that are tightly couples, such as the Gaudi framework and its IOVsrc, this item, in concert with the previous one, will enable the conditions database to be decoupled for the analysis code.
\item In the longer term, given the standard conditions database API, the development of a Software-as-a-Service for conditions databases, for example use a RESTful interface, should be encouraged. This would allow the provisioning of conditions databases to be moved completely out of the physicist's realm and into that of computing support with is more suited to maintain that type of software.
\end{itemize}