Tutorial for Stanford Sherlock 2.0 cluster

All test samples and genome data are shared on Stanford Sherlock cluster. You don't have to download any data for testing our pipeline on it.

SSH to Sherlock's login node.
```
  $ ssh login.sherlock.stanford.edu
```

Git clone this pipeline and move into it.

  $ git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2
  $ cd chip-seq-pipeline2

Download cromwell.

  $ wget https://github.com/broadinstitute/cromwell/releases/download/34/cromwell-34.jar
  $ chmod +rx cromwell-34.jar

Set your partition in workflow_opts/sherlock.json. PIPELINE WILL NOT WORK WITHOUT A PAID SLURM PARTITION DUE TO LIMITED RESOURCE SETTINGS FOR FREE USERS. Ignore other runtime attributes for singularity.
```
  {
    "default_runtime_attributes" : {
      "slurm_partition": "YOUR_SLURM_PARTITON"
    }
  }
```

Our pipeline supports both Conda and Singularity.

For Conda users

Install Conda

Install Conda dependencies.

  $ bash conda/uninstall_dependencies.sh  # to remove any existing pipeline env
  $ bash conda/install_dependencies.sh

Run a pipeline for a SUBSAMPLED (1/400) paired-end sample of ENCSR936XTK.

  $ source activate encode-chip-seq-pipeline # IMPORTANT!
  $ INPUT=examples/sherlock/ENCSR936XTK_subsampled_sherlock.json
  $ java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm cromwell-34.jar run chip.wdl -i ${INPUT} -o workflow_opts/sherlock.json

It will take about an hour. You will be able to find all outputs on cromwell-executions/chip/[RANDOM_HASH_STRING]/. See output directory structure for details.
See full specification for input JSON file.

For singularity users

Add the following line to your BASH startup script (~/.bashrc or ~/.bash_profile).
```
  module load system singularity
```
Pull a singularity container for the pipeline. This will pull pipeline's docker container first and build a singularity one on ~/.singularity. Stanford Sherlock does not allow building a container on login nodes. Wait until you get a command prompt after sdev.
```
  $ sdev    # sherlock cluster does not allow building a container on login node
  $ SINGULARITY_PULLFOLDER=~/.singularity singularity pull docker://quay.io/encode-dcc/chip-seq-pipeline:v1.1
  $ exit    # exit from an interactive node
```

Run a pipeline for a SUBSAMPLED (1/400) paired-end sample of ENCSR936XTK.

  $ source activate encode-chip-seq-pipeline # IMPORTANT!
  $ INPUT=examples/sherlock/ENCSR936XTK_subsampled_sherlock.json
  $ java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm_singularity cromwell-34.jar run chip.wdl -i ${INPUT} -o workflow_opts/sherlock.json

It will take about an hour. You will be able to find all outputs on cromwell-executions/chip/[RANDOM_HASH_STRING]/. See output directory structure for details.
See full specification for input JSON file.
IF YOU WANT TO RUN PIPELINES WITH YOUR OWN INPUT DATA/GENOME DATABASE, PLEASE ADD THEIR DIRECTORIES TO workflow_opts/sherlock.json. For example, you have input FASTQs on /your/input/fastqs/ and genome database installed on /your/genome/database/ then add /your/ to --bind in singularity_command_options. You can also define multiple directories there. It's comma-separated.
```
  {
      "default_runtime_attributes" : {
          "singularity_container" : "~/.singularity/atac-seq-pipeline-v1.1.simg",
          "singularity_command_options" : "--bind /scratch,/oak/stanford,/your/,YOUR_OWN_DATA_DIR1,YOUR_OWN_DATA_DIR1,..."
      }
  }
```

Running multiple pipelines with cromwell server mode

If you want to run multiple (>10) pipelines, then run a cromwell server on an interactive node. We recommend to use screen or tmux to keep your session alive and note that all running pipelines will be killed after walltime. Run a Cromwell server with the following commands.

  $ srun -n 2 --mem 5G -t 3-0 --qos normal -p [YOUR_SLURM_PARTITION] --pty /bin/bash -i -l    # 2 CPU, 5 GB RAM and 3 day walltime
  $ hostname -f    # to get [CROMWELL_SVR_IP]

For Conda users,

  $ source activate encode-chip-seq-pipeline
  $ _JAVA_OPTIONS="-Xmx5G" java -jar -Dconfig.file=backends/backend/conf -Dbackend.default=slurm cromwell-34.jar server

For singularity users,

  $ _JAVA_OPTIONS="-Xmx5G" java -jar -Dconfig.file=backends/backend/conf -Dbackend.default=slurm_singularity cromwell-34.jar server

You can modify backend.providers.slurm.concurrent-job-limit or backend.providers.slurm_singularity.concurrent-job-limit in backends/backend.conf to increase maximum concurrent jobs. This limit is not per sample. It's for all sub-tasks of all submitted samples.

On a login node, submit jobs to the cromwell server. You will get [WORKFLOW_ID] as a return value. Keep these workflow IDs for monitoring pipelines and finding outputs for a specific sample later.

  $ INPUT=YOUR_INPUT.json
  $ curl -X POST --header "Accept: application/json" -v "[CROMWELL_SVR_IP]:8000/api/workflows/v1" \
    -F workflowSource=@chip.wdl \
    -F workflowInputs=@${INPUT} \
    -F workflowOptions=@workflow_opts/sherlock.json

To monitor pipelines, see cromwell server REST API description for more details. squeue will not give you enough information for monitoring jobs per sample. $ curl -X GET --header "Accept: application/json" -v "[CROMWELL_SVR_IP]:8000/api/workflows/v1/[WORKFLOW_ID]/status"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial_sherlock.md

tutorial_sherlock.md

Tutorial for Stanford Sherlock 2.0 cluster

For Conda users

For singularity users

Running multiple pipelines with cromwell server mode

Files

tutorial_sherlock.md

Latest commit

History

tutorial_sherlock.md

File metadata and controls

Tutorial for Stanford Sherlock 2.0 cluster

For Conda users

For singularity users

Running multiple pipelines with cromwell server mode