-
Notifications
You must be signed in to change notification settings - Fork 107
ReqMgr2 MicroService Pileup
This wiki provides a description of the MSPileup service, as well as the final pileup data structure adopted and a plain english description of the logic to be implemented.
The architecture of MSPileup service follows MPV (Model, View, Controller) pattern with MongoDB as back-end database for storage. The service is based on WMCore REST framework and located in WMCore/MicroService/MSPileup area. It consists of two distinct services:
- MSPileup HTTP service which provides access to MSPileup via HTTP methods
- MSPileup Tasks daemon which handles internally data placement logic via polling cycles.
The codebase has the following modules: -MSPileupObj.py provides details of MSPileup data object
- MSPileupData.py provides data layer business logic, i.e. it creates, deletes, modifies MS pileup objects in MongoDB database
- MSPileup.py API layer which defines all MSPileup APIs
- MSPileupTaskManager.py API layer which defines all MSPileupTaskManager APIs
-
MSPileupTasks.py defines all MSPileup tasks like:
- monitoring task
- pileup size task
- inactive workflow task
- active workflow task
- clean-up task
- cms monit task
Within each task we process all workflows or any other tasks concurrently via asyncio module.
- Service/Data.py module defines all HTTP APIs (MSPileup and others used by MS micro-services)
- RestApiHub.py defines MSPileup HTTP end-point within WMCore REST server.
Below, you can find detailed description of individual layers.
MSPileup provides rich set of RESTful APIs which will allow clients to fetch, upload, modify and delete the pileup documents.
A few GET examples on how to fetch pileup object(s) from MSPileup are (if testing localhost, replace cmsweb by localhost:8249):
curl -v "https://cmsweb.cern.ch/ms-pileup/data/pileup?pileupName=pileup"
curl -v "https://cmsweb.cern.ch/ms-pileup/data/pileup?campaign=Campaign"
# and we can use filters to extract certain fields from the doc, e.g.
curl -v "https://cmsweb.cern.ch/ms-pileup/data/pileup?pileupName=pileup&filters=["campaigns", "pileupType"]"
The POST API call can be used either to create a new pileup object in MSPileup, or to fetch documents with a more customized filter. Examples:
# first create document
curl -v -X POST -H "Content-type: application/json" -d '{"bla":1}' \
https://cmsweb.cern.ch/ms-pileup/data/pileup
# second, query the data via provided JSON query (spec)
curl -v -X POST -H "Content-type: application/json" \
-d '{"query":{"pileupName":"bla"}}' \
https://cmsweb.cern.ch/ms-pileup/data/pileup
# third, use query and filters
curl -v -X POST -H "Content-type: application/json" \
-d '{"query":{"pileupName": "bla"}, "filters":["campaigns"]}' \
https://cmsweb.cern.ch/ms-pileup/data/pileup
The PUT method is used to update a pileup object in MSPileup. A few examples are:
# update MSPileup document and set its active status for pileupName /abc/xyz/PREMIX workflow
curl -v -X PUT -H "Content-type: application/json" -d '{"pileupName":"/abc/xyz/PREMIX", "active":false}' \
https://cmsweb.cern.ch/ms-pileup/data/pileup
Finally, the DELETE method can be used to permanently delete a pileup object in MSPileup. Note that this is only allowed at administrator level, otherwise Rucio rules might be left uncleaned.
curl -v -X DELETE -H "Content-type: application/json" -d '{"pileupName":"pileup"}' \
https://cmsweb.cern.ch/ms-pileup/data/pileup
The MSPileup data layer defines the following MSPileup data-structure:
{"pileupName": string,
"pileupType": string,
"insertTime": integer,
"lastUpdateTime": integer,
"expectedRSEs": list of strings,
"currentRSEs": list of strings,
"fullReplicas": integer,
"campaigns": list of strings,
"containerFraction": float,
"replicationGrouping": string,
"activatedOn": integer,
"deactivatedOn": integer,
"pileupSize": integer,
"ruleIds": list of strings,
"customName": string,
"transition": list of dicts
"active": boolean
}
where the mandatory parameters definition is:
-
pileupName
: string with the pileup dataset name. -
pileupType
: either "premix" or "classic" string value, according to the pileup type. -
expectedRSEs
: a non-empty list of strings with the RSE names, e.g. ["T2_US_Nebraska", "T2_CH_CERN"]. This is validated against the known Disk RSEs in Rucio. -
active
: boolean flag enabling or not the pileup data placement. IfFalse
, then the service ensures that there are no wmcore_transferor rules for this pileup.
the optional parameters definition is:
-
campaigns
: a list of strings with the campaign names using this pileup.Default: []
. -
containerFraction
: a real number that corresponds to the fraction - over the number of blocks - to be replicated to a given RSE. To be used whenever the micro-service is mature enough and automatic data placement decisions can be performed.Default: 1.0
. -
replicationGrouping
: a string matching the grouping granularity field provided by Rucio. Allowed values would be DATASET (data placement by block) or ALL (whole container is placed under the same RSE).Default: ALL
. -
fullReplicas
: an integer saying how many replicas of the pileup dataset are supposed to be created. It will eventually supersede "expectedRSEs". To be used whenever the micro-service is mature enough and automatic data placement decisions can be performed.Default: 1
. -
customName
name of custom DID -
transition
list of transition records like[{'updateTime': 123, 'containerFraction': 0.5, 'customDID': 'blah', 'DN': 'blah2'}, ...]
and the service-based parameters definition is (parameters set by the service itself, regardless if provided by the user or not):
-
insertTime
: integer with the timestamp (seconds since epoch in GMT) of when this pileup object was first defined in the microservice. -
lastUpdateTime
: integer with the timestamp (seconds since epoch in GMT) of the last modification made to this pileup object. -
currentRSEs
: a list of strings with the RSE names, e.g. ["T2_US_Nebraska", "T2_CH_CERN"], empty string must be supported! This is validated against the known Disk RSEs in Rucio. Rules that have been satisfied will have their relevant RSE name listed here. -
activatedOn
: integer with the timestamp (seconds since epoch in GMT) of when this pileup object last became active. Ifactive=True
, thenDefault: integer timestamp
, otherwiseDefault: None
. -
deactivatedOn
: integer with the timestamp (seconds since epoch in GMT) of when this pileup object last became inactive. Ifactive=False
, thenDefault: integer timestamp
, otherwiseDefault: None
. -
pileupSize
: integer with the size of the full pileup dataset (in bytes), to be filled asynchronously.Default: 0
-
ruleIds
: list of strings with the rule IDs that are locking this pileup.
Each pileup object will be persisted in the backend database (MongoDB) in their own document
The data placement logic is defined in MSPileupTasks.py module.
This section describes some architecture design choices, assumptions for the service and important information that needs to be recorded in the logs.
Some assumptions to be considered for this microservice are:
- rules will be created against a single RSE (for a single DID)
- service is supposed to be a singleton (depending on the cost-benefit, supporting multiple instances would be a bonus)
- polling cycle will very likely not be smaller than every hour
- no need to adopt database transactions (likely worst case would be to perform the same action from multiple instances, overwriting documents in MongoDB)
There are 3 main tasks to be executed by this service, all part of the same MSPileup microservice and running in the same service instance. These tasks should be executed sequentially, but their internal logic can apply concurrent processing:
- Monitoring task (listing status of rule ids and persisting it in the database)
- Inactive pileup task (clean up of rule ids that belong to inactive pileup objects)
- Active pileup task (rule creation or deletion for active pileup objects
Log records should be created whenever appropriate, including a short summary by the end of each polling cycle. Nonetheless, here is a non-exhaustive list of critical logs to have:
- start and end of each of the major activities (monitoring, active, inactive); and time spent.
- rule creation (containing the DID, RSE and rule id)
- rule deletion (containing the DID and rule id)
- rule completion (either fully satisfied or partial)
- if rule is not completed (! OK), then it would be useful to print its state as well
This task is supposed to iterate over all the MongoDB documents, fetch the current state of each rule ID and persist this information back in MongoDB. A short algorithm for it can be described as follows:
- Read pileup document from MongoDB with filter
active=true
- For each rule id in
ruleIds
:
- query Rucio for that rule id and fetch its state (e.g.:
afd122143kjmdskj
) - if state=OK, log that the rule has been satisfied and add that RSE to the
currentRSEs
(unique) - otherwise, calculate the rule completion based on the 3 locks_* field and remove that RSE from the
currentRSEs
(if existent)
- now that all the known rules have been inspected, persist the up-to-date pileup doc in MongoDB
This task is supposed to look at pileup documents that have been set to inactive. The main goal here is to ensure that there are no Rucio rules left in the system (of course, for the relevant DID and the Rucio account adopted by our microservice). Pileup documents that are updated as a result of this logic should have their data persisted back in MongoDB. A short algorithm for it can be described as follows:
- Read pileup document from MongoDB with filter
active=false
- for each DID and Rucio account, get a list of all the existent rules
- make a Rucio call to delete that rule id, then:
- remove the rule id from
ruleIds
(if any) and remove the RSE name fromcurrentRSEs
(if any)
- remove the rule id from
- make a log record if the DID + Rucio account tuple does not have any existent rules
- in this case, it would be expected to have an empty
currentRSEs
list, given that RSEs have been removed in the step above.
- once all the relevant rules have been removed, persist an up-to-date version of the pileup data structure in MongoDB
This task is supposed to look at pileup documents active in the system. Its main goal is to ensure that the pileup DID has all the requested rules (and nothing beyond them), according to the pileup object configuration. Pileup documents that are updated as a result of this logic should have their data persisted back in MongoDB. A short algorithm for this active task is described below:
- Read pileup document from MongoDB with filter
active=true
- if
expectedRSEs
is different thancurrentRSEs
, then further data placement is required (it's possible that data removal is required!) - make a local copy of the
currentRSEs
value to track rules incomplete but ongoing - for each rule matching the DID + Rucio account, perform:
- if rule RSE is not in
expectedRSEs
, then this rule needs to be deleted.- upon successful rule deletion, also remove the RSE name from
currentRSEs
andruleIds
(if any) - make a log record
- upon successful rule deletion, also remove the RSE name from
- else, save this rule RSE in the local copy of
currentRSEs
. This rule is likely still being processed by Rucio.
- now that we evaluated expected versus current, for each
expectedRSEs
not in our local copy ofcurrentRSEs
:
- first, check whether the RSE has enough available space available for that (we can assume that any storage with less than 1 TB available cannot be considered for pileup data placement)
- in case of no space available, make a log record
- in case there is enough space, make a Rucio rule for that DID + Rucio account + RSE
- now append the rule id to the
ruleIds
list
- now append the rule id to the
- once all the relevant rules have been created - or if there was any changes to the pileup object - persist an up-to-date version of the pileup data structure in MongoDB
This task looks up documents based on the following condition set:
- If document is active=False; and
- document has an empty ruleIds list; and
- document has an empty currentRSEs; and
- document has been deactivated for a while (deactivatedOn=XXX), to be configured in the service
Then, it deletes the documents met these requirements.
The MSPileupTaskManager executes CMS Monit task which fetches new documents from MSPileup backend database, then flatten them out (see below) and send them to CMS MONIT Elastic Search which is used for MSPileup dashboard and aggregation.
Each document is flatten according to simple rule, for all attributes which has list data type except ruleIds
, i.e. campaigns
attribute which has list of campaigns will be converted to campaign
which will have single value. Here is a data structure we use to send docs to ES:
{"pileupName": string,
"pileupType": string,
"insertTime": integer,
"lastUpdateTime": integer,
"expectedRSE": string,
"currentRSE": string,
"fullReplicas": integer,
"campaign": string,
"containerFraction": float,
"replicationGrouping": string,
"activatedOn": integer,
"deactivatedOn": integer,
"pileupSize": integer,
"customName": string,
"transition": list of dicts,
"ruleIds": list of strings,
"active": boolean
}
Whenever a new development kubernetes cluster is created, we need to populate the MSPileup database with the pileup configurations that we use in the DMWM and Integration workflow test templates.
To do that, you need to be in a node with the WMCore environment (e.g., an agent node). Next you need to download this file: https://github.com/dmwm/WMCore/blob/master/bin/adhoc-scripts/createPileupObjects.py, which will be used to parse a JSON file and inject pileup configuration documents into MSPileup. A pileup JSON file can be found under: https://github.com/dmwm/WMCore/blob/master/test/data/WMCore/MicroService/DataStructs/pileups_dev.json
With all these pieces in place, you can execute it like:
python3 createPileupObjects.py --url=https://cmsweb-testXXX.cern.ch --fin=pileups_dev.json --inject
NOTE: this sub-section can be removed once we know these answers.
Q.1: what is the definition of currentRSEs
?
a) RSEs that have a Rucio rule (unsatisfied rucio rule)
b) RSEs that have a complete copy of the data (a satisfied rucio rule) <----- Alan's preference!
Q.2.: what happens if a given rule id is deleted now and we try to delete it in an hour from now?
a) does the rule deletion fails the second time?
b) or does it sets a new expiration time for 2h from now?
NOTE: this sub-section can be removed once we know these answers.
Rucio wrapper client: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/Rucio/Rucio.py#L645 Rucio pycurl based wrapper: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/Tools/PycurlRucio.py
When the active task was initially discussed, we also had the following candidate logics to be considered.
CANDIDATE 1:
- Read pileup document from MongoDB with filter
active=true
- if
expectedRSEs
is different thancurrentRSEs
, then further data placement is required (it's possible that data removal is required!) - for each
expectedRSEs
not incurrentRSEs
:
- check if there is an ongoing Rucio rule for that DID + Rucio account + RSE (we might want to remove RSE from this call?)
- if there is a Rucio rule, then ensure that it's listed under
ruleIds
, otherwise add it - if there is none, then a new rule needs to be created. First, check whether the RSE has enough space available for that
- if it does, create the rule and append the rule id to
ruleIds
- otherwise, make a log record saying that there is not enough space
- if it does, create the rule and append the rule id to
- once all the relevant rules have been created, persist an up-to-date version of the pileup data structure in MongoDB
CANDIDATE 2:
- Read pileup document from MongoDB with filter
active=true
- for each DID and Rucio account, get a list of all the existent rules
- if the rule RSE name is in
expectedRSEs
, then add the rule id inruleIds
(keeping uniqueness) - elif the rule RSE name is in
currentRSEs
, then (this rule should no longer exist):- make a Rucio call to delete this rule, remove the RSE from
currentRSEs
and remove the rule id fromruleIds
(if any)
- make a Rucio call to delete this rule, remove the RSE from
- else - thus RSE not in
expectedRSEs
nor incurrentRSEs
- it means that the rule has been created by someone else, delete it!- make a Rucio call to delete this rule and delete the rule id from
ruleIds
(if any)
- make a Rucio call to delete this rule and delete the rule id from
- for each RSE in
expectedRSEs
that does not have an ongoing rule and/or that is not listed undercurrentRSEs
- create a new rule for the DID + Rucio account + RSE and add the rule id to the
ruleIds
list
- once rules have been removed or created, persist an up-to-date version of the pileup data structure in MongoDB
However, the algorithm described in the "Active pileup task" sub-section is the one that have been chosen and implemented.