-
Notifications
You must be signed in to change notification settings - Fork 176
Production System
- Authors: L.Arrabito, J. Bregeon, AT
Following the discussion at the 2017 DUW, we propose the development of a new System, called 'Production System' (PS). The goal is to develop a system with similar objectives of the LHCb production system, but keeping it as general and as simple as possible to be adopted by any community. The idea is to build a system on the top of the Transformation System, to further automatise the management of large 'productions', where a production is a set of transformations 'linked' together by input/output data. This new system, would also require a few changes in the TS, enhancing the trasformation definition.
The current transformation definition should be enhanced with a few new attributes:
- InputMetaquery (InMQ): new attribute of type: MetaQuery. It would replace the current 'fileMask'. It's optional since a transformation could have no input data, as for MC simulation
- OutputMetaquery (OutMQ): new attribute of type: MetaQuery. The jobs created by the transformation will produce some data with some associated metadata. This attribute represents these associated metadata with their expected ranges of values
- OutputMetaData: new attribute of type: dict. The jobs created by the transformation will produce some data with some associated metadata. This attribute represents the associated metadata key-value pairs (the ones that are known in advance)
Note that the metadata mentioned above are regular metadata (as in DFC), it's just that one has the opportunity to define these at the transformation level. Also, there is some redundancy bteween OutputMetaquery and OutputMetaData. It should be discussed whether it’s necessary to keep these 2 distinct attributes or only the OutMQ. In order to keep backward compatibility we could start by adding these new attributes as optional.
- Production definition
- A set of 'linked' transformations: {T0, T1, T2, ...}
- 2 transformations Ti, Tj are 'linked' if some of the output data of Ti are input of Tj or vice-versa. This is handled by the PS via the input and ouput meta query of each transformation (there must be a logical intersection between OutMQ,j and InMQ,i or between OutMQ,i and InMQ,j)
- Main production attributes : ProductionID, ProductionName, Status
- Production Execution : start/stop/monitor
- Production Validator
- Invoked during the Production definition or modificaton (i.e. when adding a new transformation to the production)
- Checks logical intersections between MetaQueries of 2 supposed ‘linked’ transformations, e.g.: OutMQ,i and InMQ,j
- Allows to establish if 2 transformations are 'linked'
In order to give a more precise idea of the PS desired functionalities and of the proposed architecture, we have done a very preliminary implementation of the PS and the of the TS enhancement, based on v6r19, available here:
https://github.com/arrabito/DIRAC/tree/ProdSys_v6r19
Here below a short description of the currently implemented components.
ProductionDB
- 1 table containing production definitions (Productions table)
- 1 table containing the associations between transformations and productions (ProductionTransformations table)
ProductionManager Service
- Exposes methods to create/delete/monitor productions (manipulating the Productions table)
- Exposes methods to associate transformations to productions (manipulating the ProductionTransformations table)
Production Client
- Exposes all the methods to manage productions
- A transformation can be added to a production with or without a ‘parent’ transformation
- A transformation added to a production with a parent transformation is a transformation whose InMQ interesects with the OutMQ of the parent transformation
- For the transformations belonging to a production, we call 2 transformations ‘linked’, if one is the parent of the other or vice-versa
Utilities
- ProdTransManager : uses the TS and the PS clients. It manages the transformations associated to the productions
- ProdValidator : uses the TS client and the FileCatalog. It validates the production definition
Some limitations of the current implementation
- The production validation done by the ProdValidator Utility is extremely simplistic.
- It just checks if InMQ is a subset of OutMQ of 2 supposed linked transformations or vice-versa. This logic should be improved
- Only '=' and 'in' operators are accepted for the moment in the In/OutMQ of the transformations belonging to a production
- For the production validation, we don’t use for the moment the OutMetaData transformation attribute, but only the OutMQ (and the InMQ)
- The progress of a production is not yet defined, neither monitored
The integration test below :
illustrates how the user would use the ProductionClient (toghether with the TransformationClient) to create a new production. In this test, we also show the role of the Production Validator, which checks if the production definition is valid. Note that a transformation can have several parent transformations. In this case the ProdValidator checks that all the ‘links’ are valid. On the other hand, several transformations can have the same parent transformation.
Here below we give an example of how the user would interact with the PS to define a simple production made of 2 transformations (t1 : MC Simulation -> t2 : DataProcessing), using the TS and the PS clients. In this example, the user should :
- Create 2 transformations using the TS client, also defining the corresponding In/OutQueries.
- Create an ‘empty’ production with a given name, using the Production Client.
- Add the first transformation (t1) to the production, without specifying any parent transformation.
- Add the second transformation (t2) to the production, specifying that it has a parent transformation (t1). At this point the ProdValidator checks that the 2 transformations are ‘linked’. If not, the production is not valid and an error is returned.
Later on the user can start/stop/delete a production and inspect the transformations is composed of (with their parent relations). The monitoring of the progress of a production should also be included.