Skip to content

Commit

Permalink
Add Dorado 0.8.0 (#1051)
Browse files Browse the repository at this point in the history
* Add Dorado 0.8.0 Dockerfile and README

* PR changes dorado

* Update Program_Licenses.md

* Update README.md

* clarified section on test POD5 file

---------

Co-authored-by: Curtis Kapsak <[email protected]>
  • Loading branch information
fraser-combe and kapsakcj authored Sep 30, 2024
1 parent fbb0b10 commit 195539a
Show file tree
Hide file tree
Showing 4 changed files with 288 additions and 0 deletions.
1 change: 1 addition & 0 deletions Program_Licenses.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ The licenses of the open-source software that is contained in these Docker image
| datasets-sars-cov-2 | Apache 2.0 | https://github.com/CDCgov/datasets-sars-cov-2/blob/master/LICENSE |
| diamond | GNU GPLv3 | https://github.com/bbuchfink/diamond/blob/master/LICENSE |
| dnaapler | MIT | https://github.com/gbouras13/dnaapler/blob/main/LICENSE |
| dorado | Oxford Nanopore Technologies PLC Public License | [ONT License](https://github.com/nanoporetech/dorado/blob/master/LICENCE.txt) |
| dragonflye | GNU GPLv3 | https://github.com/rpetit3/dragonflye/blob/main/LICENSE |
| drprg | MIT | https://github.com/mbhall88/drprg/blob/main/LICENSE |
| DSK | GNU Affero GPLv3 | https://github.com/GATB/dsk/blob/master/LICENSE |
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ To learn more about the docker pull rate limits and the open source software pro
| [datasets-sars-cov-2](https://github.com/CDCgov/datasets-sars-cov-2) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/datasets-sars-cov-2)](https://hub.docker.com/r/staphb/datasets-sars-cov-2) | <ul><li>0.6.2</li><li>0.6.3</li><li>0.7.2</li></ul> | https://github.com/CDCgov/datasets-sars-cov-2 |
| [diamond](https://github.com/bbuchfink/diamond) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/diamond)](https://hub.docker.com/r/staphb/diamond) | <ul><li>[2.1.9](./diamond/2.1.9)</li></ul> | https://github.com/bbuchfink/diamond|
| [dnaapler](https://hub.docker.com/r/staphb/dnaapler) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/dnaapler)](https://hub.docker.com/r/staphb/dnaapler) | <ul><li>[0.1.0](dnaapler/0.1.0/)</li></ul> <ul><li>[0.4.0](dnaapler/0.4.0/)</li><li>[0.5.0](dnaapler/0.5.0/)</li><li>[0.5.1](dnaapler/0.5.1/)</li><li>[0.7.0](dnaapler/0.7.0/)</li><li>[0.8.0](dnaapler/0.8.0/)</li></ul> | https://github.com/gbouras13/dnaapler |
| [dorado](https://hub.docker.com/r/staphb/dorado) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/dorado)](https://hub.docker.com/r/staphb/dorado) | <ul><li>[0.8.0](dorado/0.8.0/)</li></ul> | [https://github.com/nanoporetech/dorado](https://github.com/nanoporetech/dorado) |
| [dragonflye](https://hub.docker.com/r/staphb/dragonflye) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/dragonflye)](https://hub.docker.com/r/staphb/dragonflye) | <ul><li>[1.0.14](./dragonflye/1.0.14/)</li><li>[1.1.1](./dragonflye/1.1.1/)</li><li>[1.1.2](./dragonflye/1.1.2/)</li><li>[1.2.0](./dragonflye/1.2.0/)</li><li>[1.2.1](./dragonflye/1.2.1/)</li></ul> | https://github.com/rpetit3/dragonflye |
| [Dr. PRG ](https://hub.docker.com/r/staphb/drprg) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/drprg)](https://hub.docker.com/r/staphb/drprg) | <ul><li>[0.1.1](drprg/0.1.1/)</li></ul> | https://mbh.sh/drprg/ |
| [DSK](https://hub.docker.com/r/staphb/dsk) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/dsk)](https://hub.docker.com/r/staphb/dsk) | <ul><li>[0.0.100](./dsk/0.0.100/)</li><li>[2.3.3](./dsk/2.3.3/)</li></ul> | https://gatb.inria.fr/software/dsk/ |
Expand Down Expand Up @@ -376,3 +377,6 @@ Each Dockerfile lists the author(s)/maintainer(s) as a metadata `LABEL`, but the
* [@stephenturner](https://github.com/stephenturner)
* [@soejun](https://github.com/soejun)
* [@taylorpaisie](https://github.com/taylorpaisie)
* [@fraser-combe](https://github.com/fraser-combe)


64 changes: 64 additions & 0 deletions dorado/0.8.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Use NVIDIA CUDA image as the base image
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04 AS app

ARG DORADO_VER=0.8.0

# Metadata
LABEL base.image="nvidia/cuda:12.2.0-devel-ubuntu20.04"
LABEL dockerfile.version="1"
LABEL software="dorado ${DORADO_VER}"
LABEL software.version="${DORADO_VER}"
LABEL description="A tool for basecalling Fast5/Pod5 files from Oxford Nanopore sequencing"
LABEL website="https://github.com/nanoporetech/dorado"
LABEL license="https://github.com/nanoporetech/dorado/blob/master/LICENSE"
LABEL original.website="https://nanoporetech.github.io/dorado/"
LABEL maintainer="Fraser Combe"
LABEL maintainer.email="[email protected]"

# Set working directory
WORKDIR /usr/src/app

# Install dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends wget ca-certificates && \
rm -rf /var/lib/apt/lists/* && apt-get autoclean

# Download and extract Dorado package
RUN wget https://cdn.oxfordnanoportal.com/software/analysis/dorado-${DORADO_VER}-linux-x64.tar.gz \
&& tar -xzvf dorado-${DORADO_VER}-linux-x64.tar.gz -C /opt \
&& rm dorado-${DORADO_VER}-linux-x64.tar.gz

# Set environment variables for Dorado binary
ENV PATH="/opt/dorado-${DORADO_VER}-linux-x64/bin:${PATH}"

# Download basecalling models
RUN mkdir /dorado_models && \
cd /dorado_models && \
dorado download --model all

# Default command
CMD ["dorado"]

# -----------------------------
# Test Stage
# -----------------------------
FROM app AS test


# Download the specific Pod5 test file
RUN wget -O /usr/src/app/dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
https://github.com/nanoporetech/dorado/raw/release-v0.7/tests/data/pod5/dna_r10.4.1_e8.2_260bps/\
dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5

# Set working directory
WORKDIR /usr/src/app

# Run test command (using CPU mode)
RUN dorado basecaller \
--device cpu \
/dorado_models/[email protected] \
dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
--emit-moves --max-reads 10 > basecalled.sam

# Verify the output file exists and is not empty
RUN test -s basecalled.sam
219 changes: 219 additions & 0 deletions dorado/0.8.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
# Dorado Docker Image

This Dockerfile sets up an environment for running **Dorado**, a tool for basecalling Fast5/Pod5 files from Oxford Nanopore sequencing.

## Table of Contents

- [Introduction](#introduction)
- [Requirements](#requirements)
- [Building the Docker Image](#building-the-docker-image)
- [Running the Docker Container](#running-the-docker-container)
- [Testing the Docker Image](#testing-the-docker-image)
- [Basecalling Test](#basecalling-test)
- [Verifying the Output](#verifying-the-output)
- [Additional Notes](#additional-notes)
- [License](#license)

## Introduction

This Docker image includes:

- **Dorado**: Version **0.8.0**, a tool for basecalling Oxford Nanopore sequencing data.
- **NVIDIA CUDA**: Version **12.2.0**, for GPU acceleration (requires NVIDIA GPU).
- **Pre-downloaded basecalling models**: All models are downloaded during the build process for basecalling.

## Requirements

- **Docker**: Installed on your system.
- **NVIDIA GPU and Drivers**: Installed and configured.
- **NVIDIA Container Toolkit**: To enable GPU support in Docker containers.

## Running the Docker Container

To run the Dorado tool within the Docker container, use the following command:

```bash
docker run --gpus all -it dorado-image dorado --help
```

This command will display the help information for Dorado, confirming that it's installed correctly.

## Testing the Docker Image

To test that Dorado is working correctly, you will need to download a sample Pod5 file and perform a basecalling operation using the pre-downloaded basecalling models.

```bash
wget -O dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
https://github.com/nanoporetech/dorado/raw/release-v0.7/tests/data/pod5/dna_r10.4.1_e8.2_260bps/dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5

### Basecalling Test

Run the following command:

```bash
docker run --gpus all -v $(pwd):/usr/src/app -it dorado-image bash -c "\
dorado basecaller /dorado_models/[email protected] \
/usr/src/app/dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
--emit-moves > /usr/src/app/basecalled.sam"
```

**Explanation:**

- `--gpus all`: Enables GPU support.
- `-v $(pwd):/usr/src/app`: Mounts the current directory to `/usr/src/app` inside the container.
- `bash -c "...":` Runs the basecalling command inside the container.
- `> /usr/src/app/basecalled.sam`: Redirects the output to `basecalled.sam` in your current directory.

### Verifying the Output

Check the output file to ensure basecalling was successful:

```bash
samtools view basecalled.sam
```

You should see SAM-formatted basecalling results.

## Additional Notes

- **Sample Data**: The sample Pod5 file is downloaded to `/usr/src/app` during the docker image build.
- _Note: If you are using the pre-built StaPH-B docker image downloaded from dockerhub or quay.io, it will only include the `app` stage. This means that the sample Pod5 file will not be available in the container. You will need to download the sample Pod5 file manually using the `wget` example command shown above._
- **Internal Testing**: An internal test stage is included in the Dockerfile to verify installation.
- **Basecalling Models**: All models are downloaded to `/dorado_models` during the build process.
Below is the list of basecalling models included in the Docker image:
```yaml
modification models:
- "[email protected][email protected]"
- "[email protected][email protected]"
- "[email protected][email protected]"
- "[email protected]_5mCG_5hmCG@v0"
- "[email protected]_5mCG_5hmCG@v0"
- "[email protected]_5mCG_5hmCG@v0"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected][email protected]"
- "[email protected]_5mC@v2"
- "[email protected]_6mA@v2"
- "[email protected]_6mA@v3"
- "[email protected]_5mC_5hmC@v1"
- "[email protected]_5mC_5hmC@v1"
- "[email protected]_5mC_5hmC@v1"
- "[email protected]_6mA@v1"
- "[email protected]_6mA@v1"
- "[email protected]_6mA@v2"
- "[email protected]_6mA@v2"
- "[email protected]_5mCG_5hmCG@v1"
- "[email protected]_5mCG_5hmCG@v1"
- "[email protected]_4mC_5mC@v1"
- "[email protected]_4mC_5mC@v1"
- "[email protected]_4mC_5mC@v2"
- "[email protected]_4mC_5mC@v2"
- "[email protected]_5mC_5hmC@v1"
- "[email protected]_5mC_5hmC@v1"
- "[email protected]_5mC_5hmC@v2"
- "[email protected]_5mC_5hmC@v2"
- "[email protected]_5mCG_5hmCG@v1"
- "[email protected]_5mCG_5hmCG@v1"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_5mCG_5hmCG@v2"
- "[email protected]_6mA@v1"
- "[email protected]_6mA@v1"
- "[email protected]_6mA@v2"
- "[email protected]_6mA@v2"
- "[email protected]_m6A_DRACH@v1"
- "[email protected]_m6A@v1"
- "[email protected]_m6A@v1"
- "[email protected]_m6A_DRACH@v1"
- "[email protected]_m6A_DRACH@v1"
- "[email protected]_pseU@v1"
- "[email protected]_pseU@v1"
- "[email protected]_m5C@v1"
- "[email protected]_m5C@v1"
- "[email protected]_inosine_m6A@v1"
- "[email protected]_inosine_m6A@v1"
- "[email protected]_m6A_DRACH@v1"
- "[email protected]_m6A_DRACH@v1"
- "[email protected]_pseU@v1"
- "[email protected]_pseU@v1"
stereo models:
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
simplex models:
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "rna002_70bps_fast@v3"
- "rna002_70bps_hac@v3"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
- "[email protected]"
```

## License

Dorado is licensed under [Oxford Nanopore Technologies' License](https://github.com/nanoporetech/dorado/blob/master/LICENSE).
---
**Note**: Please ensure that you have the necessary NVIDIA drivers and the NVIDIA Container Toolkit installed to utilize GPU acceleration.
---

0 comments on commit 195539a

Please sign in to comment.