Transitioning from (Dirac Worklfow, JDL) to (CWL, pydantic JobModel) #175
Replies: 4 comments 4 replies
-
Draft of a new model based on CWL (v2)The presented job submission model is designed to facilitate the execution of CWL (Common Workflow Language) tasks in diverse computing environments, particularly focusing on remote execution scenarios. This model introduces a flexible and efficient way to specify job requirements, including the source of executables and inputs, allowing for seamless integration with various data storage solutions such as local filesystems, sandbox stores, and storage elements (SE). Model specifications:
Key components of the model include:
Here is an example: {
# CWL workflow
"cwl_task": {
"cwlVersion": "v1.2",
"class": "CommandLineTool",
"label": "My Job",
"doc": "A simple task.",
# Requirements
"requirements": {
"ResourceRequirement": {
"ramMin": 500, # Mebibytes
"coresMax": 1,
},
"EnvVarRequirement": {
"envDef": [
{"envName": "HELLO", "envValue": "hello"}
]
}
},
# Command
"baseCommand": "process.py",
# Inputs
"inputs": {
"configuration": {
"type": "File",
"inputBinding": {
"position": 1
}
},
"run-number": {
"type": "int",
"inputBinding": {
"position": 2
}
},
"max-events": {
"type": "int",
"default": 1000,
"inputBinding": {
"position": 3
}
}
},
"outputs":
[
{
"id": "output",
"type": "File",
"outputBinding": {
"glob": "output.sim"
}
}
]
},
# List of parameters to pass to the CWL
# Each item represents a job
"parameters": [
# 1st job
{
"sandbox": [
"s3://mybucket/path/to/sandbox1.tar.bz",
],
"cwl": {
"configuration": {
"class": "File",
"path": "config1.json",
},
"run-number": 123,
"max-events": 1000,
}
},
# 2nd job
{
# We could potentially have another sandbox here
"sandbox": {
"url": "s3://mybucket/path/to/sandbox2.tar.bz",
},
"cwl": {
"configuration": {
"class": "File",
"path": "lfn:/path/to/config2.json",
},
"run-number": 124
}
}
],
# This is a common part to all the jobs that will be generated
"job_description": {
"sites": ["Site1"],
"cpu_work": 1500, # Should this be included?
"platform": "x86_64-alma9",
"priority": 1,
"outputs": {
"remote_data": ["output.sim"],
"remote_se": "SE-USER"
}
}
} Advantages:
Considerations:
This description and proposed model are open for discussion and further enhancements. Changelog:
|
Beta Was this translation helpful? Give feedback.
-
Example of a user job submission workflow (v1):
{
"cwlVersion": "v1.2",
"class": "CommandLineTool",
"label": "My Job",
"doc": "A simple task.",
"requirements": {
"ResourceRequirement": {
"ramMin": 500,
"coresMax": 1,
},
"EnvVarRequirement": {
"envDef": [
{"envName": "HELLO", "envValue": "hello"}
]
}
},
"baseCommand": "process.py",
"inputs": {
"configuration": {
"type": "File",
"inputBinding": {
"position": 1
}
},
"run-number": {
"type": "int",
"inputBinding": {
"position": 2
}
},
"max-events": {
"type": "int",
"default": 1000,
"inputBinding": {
"position": 3
}
}
},
"outputs":
[
{
"id": "output",
"type": "File",
"outputBinding": {
"glob": "output.sim"
}
}
]
},
{
"configuration": {
"class": "File",
"path": "config1.json",
},
"run-number": 123,
"max-events": 1000
} {
"configuration": {
"class": "File",
"path": "config2.json",
},
"run-number": 124
}
$ cwltool job.cwl local1.cwl
{
"configuration": {
"class": "File",
"path": "/cvmfs/repo/path/to/config3.json",
},
"run-number": 124
} {
"configuration": {
"class": "File",
"path": "lfn:/path/to/config4.json",
},
"run-number": 124
}
$ dirac jobs submit job.cwl
--params local1.cwl
--params local2.cwl
--params remote1.cwl
--params remote2.cwl
--site site1
# returns 4 job ids: [8543, 8544, 8545, 8546]
|
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for this transitioning plan. I think that it's a very good plan which allows for a smooth transition. Here below my comments/questions:
I think that it would be better to introduce the Workflow table directly in Step1 since as you said it will simplify the management of sandoboxes.
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Here is a possible transition plan (mostly prepared by @simon-mazenoux) that can be discussed:
Step1: Now
cwl-tool
in DIRACOS2/submit
endpoint, which would accept a CWL and DIRAC specific arguments (pydanticJobModel
). The method would:JobModel
)JobModel
into a JDL such as:/submit-jdl
. Theoretically, DIRAC is flexible enough to handle it without any further modification.Note: there would be 2 CLI:
dirac jobs submit-jdl
anddirac jobs submit
.Step2: Once the WMS tasks and routers are all implemented
Workflow
table that will contain the CWL workflows (WorkflowID
,Workflow
). Instead of being stored in the input sandbox, the CWL will be sent in theWorkflow
DB./submit
endpoint does not convert the CWL and the DIRAC specific arguments into a JDL anymore:Workflow
tableJobModel
Step3: Once the
/submit-jdl
path is not used anymore/submit-jdl
routejobDescription.xml
within the WMSAlternatively: the
Workflow
table from Step2 could be introduced in DIRAC in Step1, to avoid the issue about having multiple input sandboxes for a same job. We have a preference for this option.Any opinion about this plan?
Beta Was this translation helpful? Give feedback.
All reactions