-
Notifications
You must be signed in to change notification settings - Fork 176
Production System
- Authors: L.Arrabito, J. Bregeon, AT
Following the discussion at the 2017 DUW, we propose the development of a new System, called 'Production System'. The goal is to develop a system with similar objectives of the LHCb production system, but keeping it as general and as simple as possible to be adopted by several communities. The idea is to build a system on the top of the Transformation System, to further automatise the management of large 'productions', where a production is a set of transformations 'linked' together by input/output data. In order to define a Production, we also propose some enhancement in the transformation definition.
The current transformation definition should be enhanced with new attributes: ** InMQ optional (Input Metaquery). NEW attribute of type: MetaQuery. It would replace the current 'fileMask'. It's optional since a transformation could have no input data, as for MC simulation. ** OutMQ optional (Output Metaquery). NEW attribute of type: MetaQuery. The jobs created by the transformation will produce some data with some associated metadata. The OutputQuery attribute represents a set of metadata with their expected ranges of values. ** OutputMetaData optional. NEW attribute of type: dict. The jobs created by the transformation will produce some data with some associated metadata. The Metadata attribute represents the set of metadata key value pairs. Only the ones that are known in advance. *** Note that these are regular metadata (as in DFC), it's just that one has the opportunity to define these at the transformation level and they will be applied to all output files *** Note that there is a kind of redundancy with "OutMQ", i.e. data that correspond to the OutMQ will have utputMetaData associated.
- In order to keep backward compatibility we could start by adding the new attributes as optional.
- Production definition ** A set of 'linked' transformations: {T0, T1, T2, ...} *** 2 transformations Ti, Tj are 'linked' if some of the output data of Ti is an input of Tj, and this will be handled by the production system via the input and ouput meta query of each transformation (there is an intersection between OutMQ,j and InMQ,i or between OutMQ,i and InMQ,j) ** prodID ** Status
- Production Execution ** create/start/monitor/stop/validate
- Production Validator ** Called when defining a production to verify the Production validity ** Check intersections between MetaQueries, e.g.: OutMQ,i and InMQ,j ** Allows to establish if 2 transformations are 'linked'
In order to give a more precise idea of the PS desired functionalities, we have done a preliminary implementation of the PS and the of the TS enhancement, based on v6r19:
https://github.com/arrabito/DIRAC/tree/ProdSys_v6r19
We give an example of how to define a simple production made of 2 transformations (MC Simulation -> DataProcessing), using the TS and the PS clients. Note that in the transformation definitions below, for simplicity we have only showed the 'NEW' attributes.
The integration test below illustrates how the user would use the ProductionClient (and the TransformationClient) to create a new production. In this test, we also show the role of the Production Validator, when the user define a 'wrong' production.
** An utility checks if these 2 conditions are verified:
*** InputQuery2 intersects with OutputQuery1. This means that potentially files produced by t1 could be input of t2.
*** Metadata1 verifies InputQuery2. This means that at least part of the files potentially produced by t1 are input of t2. However, there could be a part of the files produced by t1 which are not input of t2 (example those with 'outputType'='Log'). It could also happen that for some reasons t1 produces only 'Log' files, so that t2 has no inputs coming from t1. In this latter case the production t1,t2 is still valid, because the definition is correct, but by chance, only Log files have been produced.
** If the 2 conditions are verified, the transformations are 'linked' and the production ([t1,t2]) definition is valid
- Current MetaQuery module should be moved from DMS to Core and the logic should be improved
DataManagementSystem/Client/MetaQuery.py -> Core/Utilities/MetaQuery.py