-
Notifications
You must be signed in to change notification settings - Fork 176
JobRequirements RFC
RFC #7
Authors: S.Poss, A.Tsaregorodtsev
Last Modified: 9.03.2013
User jobs can specify various requirements to the resources to be eligible for the job execution. In most cases these requirements are specified as reserved keywords in the job description ( JDl ). The keywords are:
- Site
- BannedSite
- Platform
- CPUTime
If the these parameters are specified in the job description, they are added to the definition of the corresponding Task Queue ( TQ ). The pilots are providing the resource description which is matched against TQs by the Matcher service.
The described standard matching mechanism is very efficient but is rather limited as well. Not all the requirements can be expressed in terms of predefined job parameters. New activities can require new resources specification that can be requested by the users jobs. Examples are: preinstalled software tags, specific services available on site, e.g. databases, memory available for jobs, CPU models, etc. Therefore there is a necessity to add more non-predefined characterisitics to the resources ( sites, CEs, queues ) that can be used in the job requirements without changing the code and the schema of the TQ database. In the present RFC, we present a proposal for such mechanism.
# JobRequirements applied during TaskQueue creation There is the idea to apply job-site requirements (like software tags) during the TaskQueue creation, and not during the Matching. This would use the new Resources description (in v6r8). What it implies (as far as I can tell) is to follow the same kind of structure that the InputData treatment has:
## Change in JobDB: * Addition of a table to hold the requirements: JobID, ReqName, ReqType, ReqValue, ReqOperator (this one should hold (>, <, <=, >=, =, in). ReqType may not be needed if the type is checked in the python code. The python should know what are the possible requirements (from the CS) and throw an error when trying to add a Req that does not exist at any site. * Addition of setters and getters in the JobDB.py (as dicts probably, a bit like meta data in the FC)
## Change in JobManager * In the submitJob of the JobManager the proper calls should be added.
## Change needed for executor * JobStates should have a method to access the requirements as needed by the executors
## New executor: JobRequirements * Would find the sites matching the requirements among the available ones. Should it take all unbanned sites? Or only those that a previously ran executor would have selected?
## Interface * Addition of the relevant code in the DIRAC API to set the requirements in the JDL: need a cleaver way of encoding the operators (as req>12 is not req=12)
## Clean up * The Requirements table must be cleaned when the jobs are removed (what about reset?) * What to do with requirements that are dropped from the CS? Should there be a watch agent that cleans the tables?