Skip to content

siligam/sworkflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sworkflow

A flexible work-flow for scripting job dependencies in slurm.

A typical approach for defining dependencies between tasks is to capture the JOB_ID of the previously submitted job and use it in the next job submission in order to make the latter as a dependent of the previous job. The same process is repeated for further dependent jobs down the lane. The 2 actions that are intertwined in this approach are ordering of jobs and managing of JOB_ID's. For fewer jobs this is manageable but as the larger job, soon it might become a bit complicated to follow the dependencies.

A different approach is to have separation of concerns by expressing dependencies between tasks as a separate python dictionary mapping and tasks to run as separate python dictionary mapping.

Installation

clone the repository

git clone https://github.com/siligam/sworkflow.git
cd sworkflow

Install the package using conda

conda env create -f environment.yaml -n sworkflow
pip install .

Usage

consider the follow example: (typical slurm job dependency work-flow commands executed in a shell or via bash script)

jobid_1=$(sbatch --parsable preprocess.sh)
jobid_2=$(sbatch --parsable --depend=afterok:${jobid_1} modelrun.sh)
jobid_3=$(sbatch --parsable --depend=afterok:${jobid_2} postprocessing.sh) 

This can be written as: (python session)

import sworkflow

dep = {
    "modelrun": "afterok:preprocess",
    "postprocess": "afterok:modelrun",
}

tasks = {
    "preprocess": "preprocess.sh",
    "modelrun": "sbatch modelrun.sh",
    "postprocess": "sbatch postprocessing.sh",
}

s = sworkflow.Suite(dep, tasks)
s.submit()
s.visualize(as_ascii=True)

┌────────────┐     ┌──────────┐     ┌─────────────┐
│ preprocess │ ──▶ │ modelrun │ ──▶ │ postprocess │
└────────────┘     └──────────┘     └─────────────┘

cli

create a yaml file with task dependencies

cat > model.yaml <<EOF
dependency:
  modelrun: afterok:preprocess
  postprocess: afterok:modelrun
jobs:
  preprocess: preprocess.sh
  modelrun: modelrun.sh
  postprocess: postprocessing.sh
EOF

visualize the suite

sworkflow -f model.yaml vis
┌────────────┐     ┌──────────┐     ┌─────────────┐
│ preprocess │ ──▶ │ modelrun │ ──▶ │ postprocess │
└────────────┘     └──────────┘     └─────────────┘

submit suite to slurm. Upon submission, each of the tasks gets assigned slurm job-ids

sworkflow -f model.yaml submit
{'modelrun': '9078434', 'postprocess': '9078435', 'preprocess': '9078433'}

If desired, this can be verified using the squeue command as follows

squeue --me
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           9078433    shared     wrap  a270243 CG       0:04      1 l40000
           9078434    shared     wrap  a270243 PD       0:00      1 (Dependency)
           9078435    shared     wrap  a270243 PD       0:00      1 (Dependency)

job status can be tracked with sworkflow as follows

sworkflow -f model.yaml vis
┌──────────────────────────────┐     ┌──────────────────────────┐     ┌─────────────────────────────┐
│ preprocess 9078433 COMPLETED │ ──▶ │ modelrun 9078434 RUNNING │ ──▶ │ postprocess 9078435 PENDING │
└──────────────────────────────┘     └──────────────────────────┘     └─────────────────────────────┘

The vis command can be repeated any number of times to get the current activate status of the job

sworkflow -f model.yaml vis
┌──────────────────────────────┐     ┌────────────────────────────┐     ┌───────────────────────────────┐
│ preprocess 9078433 COMPLETED │ ──▶ │ modelrun 9078434 COMPLETED │ ──▶ │ postprocess 9078435 COMPLETED │
└──────────────────────────────┘     └────────────────────────────┘     └───────────────────────────────┘

The status command also provides the same information without visualization

sworkflow -f model.yaml status
modelrun  9078434  COMPLETED
postprocess  9078435  COMPLETED
preprocess  9078433  COMPLETED

NOTE

In model.yaml job description for each task is written in super minimal way. Job submission may fail if some of the slurm directives like account and partition are missing.

Here is a more concrete example of yaml file including those details

cat > model.yaml <<EOF
dependency:
  modelrun: afterok:preprocess
  postprocess: afterok:modelrun
jobs:
  preprocess: sbatch -A ab0246 -p shared preprocess.sh
  modelrun: sbatch -A ab0246 -p compute modelrun.sh
  postprocess: sbatch -A ab0246 -p shared postprocessing.sh
EOF

If relevant slurm directives are declared in the script files, the above example can go back to the super minimal representation and it still works.

cat > model.yaml <<EOF
dependency:
  modelrun: afterok:preprocess
  postprocess: afterok:modelrun
jobs:
  preprocess: preprocess.sh
  modelrun: modelrun.sh
  postprocess: postprocessing.sh
EOF

TIP

instead of providing -f model.yaml argument to sworkflow everytime, an environment varible can be set as follows

export SFILE=model.yaml

Having done that, sworkflow commands become less cluttering as follows

sworkflow status
suite definition: model.yaml

modelrun  9078468  COMPLETED
postprocess  9078469  COMPLETED
preprocess  9078467  COMPLETED

Big Picture

The work-flow to build a simple or a complex task dependency is to just concentrate on gradual iterative building of task graph. Associating the task names with respective job scripts can be differed as a final step. Just start off building partial graph and get a intuitive feeling of the task graph by visualizing it (sworkflow vis). Once the task dependencies are defined as desired, it is also possible to submit the suite even before associating task names with the actual job scripts. In this case, a dummy task of sleeping for 2 seconds is triggered for each task. This way, corrections to suite can be made if task flow through the graph deviates from users expectation. As a final step, define association of task name to job script.

For instance, consider building a task graph which branches and merges. Conceptually this is formulated as a particular task runs only after 2 previous tasks completes. Those 2 previous tasks run only if a task before those 2 tasks completes. To illustrate this, look at the following yaml file

cat > branch_merge.yaml <<EOF
dependency:
  E: afterok:D
  D: after:B,afterany:C
  B: afterok:A
  C: afterok:A
EOF

Here, task E depends on D to complete but D depends on both B and C to complete. Tasks B and C run only when task A completes. It is possible to visualize this suite in its current state as follows

sworkflow -f branch_merge.yaml vis
┌───┐     ┌───┐     ┌───┐     ┌───┐
│ A │ ──▶ │ B │ ──▶ │ D │ ──▶ │ E │
└───┘     └───┘     └───┘     └───┘
  │                   ▲
  │                   │
  ▼                   │
┌───┐                 │
│ C │ ────────────────┘
└───┘

change the orientation of the graph if it improves understanding of task flow as follows

sworkflow -f branch_merge.yaml vis --rankdir left
                              ┌───┐
            ┌──────────────── │ C │
            │                 └───┘
            │                   ▲
            │                   │
            ▼                   │
┌───┐     ┌───┐     ┌───┐     ┌───┐
│ E │ ◀── │ D │ ◀── │ B │ ◀── │ A │
└───┘     └───┘     └───┘     └───┘

Presentation

I gave talk about this tool (in its early development stage) in one of GoeHPCoffee sessions at GWDG. The presentation covers a brief overview of slurm's --dependency feature and also showcases usage of this package with few examples.

Presentation link: https://pad.gwdg.de/ZVBwU_rPSOih4PWK0B3wMA?view

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages