The nbis-meta
Docker image allows you to run the workflow in a container that uses Linux (debian) as the underlying operating system. This may be beneficial for users who don't have access to a computer with Mac or some Linux distribution, and can also help resolve missing dependencies. For example, the concoct
and gtdb-tk
packages are currently not available for Mac but with Docker you can still run a full analysis of your data using these packages on your Mac computer.
First of all you need to have Docker installed on your computer. Follow the installation instructions to get started.
Next you need to pull the nbis-meta
Docker image. Simply run:
docker pull nbisweden/nbis-meta
This pulls the latest version of the workflow from DockerHub. You may also pull a specific version of the workflow, e.g. to pull the 2.0
version:
docker pull nbisweden/nbis-meta:2.0
Check the list on DockerHub to see what versions are available.
You can check the list of images on your computer by running docker images
. This should show something similar to:
REPOSITORY TAG IMAGE ID CREATED SIZE
nbisweden/nbis-meta latest b7774e010682 1 day ago 1.15GB
The basic command for running a docker container is docker run <container>
. For running an nbis-meta
container the command would thus be:
docker run nbisweden/nbis-meta
This will spawn a new Docker container running the nbis-meta
workflow with default settings (using a default configuration file inside the container).
However, this only allocates a local filesystem for the container which means it can only work with data, and produce output, inside the container. In order to use your own files (e.g. configuration files and raw data), and generate results on your computer you need to use bind mounts specified with the -v
flag.
Let's try an example where we want to run the workflow with Docker using:
- samples specified in a file called
mysamples.tsv
- a configuration file called
myconfig.yaml
- raw data stored under
data/
in the current folder.
Let's say the sample list mysamples.tsv
looks like this:
sample | unit | assembly | fq1 | fq2 |
---|---|---|---|---|
sample1 | 1 | co-assembly | data/sample1_R1.fastq.gz | sample1_R2.fastq.gz |
sample2 | 1 | co-assembly | data/sample2_R2.fastq.gz | sample2_R2.fastq.gz |
sample3 | 1 | co-assembly | data/sample3_R2.fastq.gz | sample3_R2.fastq.gz |
The configuration file myconfig.yaml
looks like this:
sample_list: mysamples.tsv
assembly:
metaspades: True
annotation:
pfam: True
infernal: False
classification:
metaphlan: True
kraken: False
With this set up the workflow will read sample information from mysamples.tsv
, use the metaspades
assembler to create 1 assembly called co-assembly
, annotate it with PFAM protein families, skip the identification of rRNA genes with infernal
, create a taxonomic profile of each sample using metaphlan
but skip classification of reads with kraken
.
Note that the configuration file does not need to contain the full list of parameters. Instead, parameters specified in a configuration file passed at runtime will overwrite the default settings for the workflow.
Our directory now contains the following:
data/
sample1_R1.fastq.gz
sample1_R2.fastq.gz
sample2_R1.fastq.gz
sample2_R2.fastq.gz
sample3_R1.fastq.gz
sample3_R2.fastq.gz
myconfig.yaml
mysamples.tsv
In order to run the workflow with Docker using this set up we need to make the data/
directory, the myconfig.yaml
config file and the mysamples.tsv
sample list available to the container.
Also, to make the output from the workflow available on your local filesystem we need to mount the proper results directory.
This can be done by specifying -v
for each bind mount on the command line:
docker run -v "$(pwd)"/data:/analysis/data \
-v "$(pwd)"/myconfig.yaml:/analysis/myconfig.yaml \
-v "$(pwd)"/mysamples.tsv:/analysis/mysamples.tsv \
-v "$(pwd)"/results:/analysis/results \
nbisweden/nbis-meta --configfile myconfig.yaml
Everything you specify after the nbisweden/nbis-meta
part of the command line call is passed to snakemake as if you had cloned the GitHub repository, installed the conda environment and called snakemake from there. Here we pass --configfile myconfig.yaml
but you can for instance also pass --cores 4
to run the workflow with 4 cpus instead of 1:
docker run -v "$(pwd)"/data:/analysis/data \
-v "$(pwd)"/myconfig.yaml:/analysis/myconfig.yaml \
-v "$(pwd)"/mysamples.tsv:/analysis/mysamples.tsv \
-v "$(pwd)"/results:/analysis/results \
nbisweden/nbis-meta --configfile myconfig.yaml --cores 4
You could also run parts of the workflow by supplying a valid target, e.g. qc
to only run the preprocessing and quality control part of the workflow:
docker run -v "$(pwd)"/data:/analysis/data \
-v "$(pwd)"/myconfig.yaml:/analysis/myconfig.yaml \
-v "$(pwd)"/mysamples.tsv:/analysis/mysamples.tsv \
-v "$(pwd)"/results:/analysis/results \
nbisweden/nbis-meta --configfile myconfig.yaml --cores 4 qc
Docker containers and images can quickly eat up disk space on your computer. You can list running and stopped containers with docker ps -a
and remove containers with docker rm <container id>
.
To have docker automatically remove containers upon completion or exit add --rm
to the docker run
call.