Skip to content

Commit

Permalink
Merge pull request #1076 from Kincekara/bbtools
Browse files Browse the repository at this point in the history
adds bbtools 39.10
  • Loading branch information
erinyoung authored Oct 4, 2024
2 parents f0017b6 + 9fed37d commit beee206
Show file tree
Hide file tree
Showing 3 changed files with 183 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ To learn more about the docker pull rate limits and the open source software pro
| [Auspice](https://hub.docker.com/r/staphb/auspice) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/auspice)](https://hub.docker.com/r/staphb/auspice) | <ul><li>2.12.0</li></ul> | https://github.com/nextstrain/auspice |
| [bakta](https://hub.docker.com/r/staphb/bakta) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bakta)](https://hub.docker.com/r/staphb/bakta) | <ul><li>[1.9.2](./bakta/1.9.2/)</li><li>[1.9.2-light](./bakta/1.9.2-5.1-light/)</li><li>[1.9.3](./bakta/1.9.3/)</li><li>[1.9.3-light](./bakta/1.9.3-5.1-light/)</li><li>[1.9.4](./bakta/1.9.4/)</li><li>[1.9.4-5.1-light](./bakta/1.9.4-5.1-light/)</ul> | https://github.com/oschwengers/bakta |
| [bandage](https://hub.docker.com/r/staphb/bandage) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bandage)](https://hub.docker.com/r/staphb/bandage) | <ul><li>[0.8.1](./bandage/0.8.1/)</li></ul> | https://rrwick.github.io/Bandage/ |
| [BBTools](https://hub.docker.com/r/staphb/bbtools/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bbtools)](https://hub.docker.com/r/staphb/bbtools) | <ul><li>[38.76](./bbtools/38.76/)</li><li>[38.86](./bbtools/38.86/)</li><li>[38.95](./bbtools/38.95/)</li><li>[38.96](./bbtools/38.96/)</li><li>[38.97](./bbtools/38.97/)</li><li>[38.98](./bbtools/38.98/)</li><li>[38.99](./bbtools/38.99/)</li><li>[39.00](./bbtools/39.00/)</li><li>[39.01](./bbtools/39.01/)</li><li>[39.06](./bbtools/39.06/)</li></ul> | https://jgi.doe.gov/data-and-tools/bbtools/ |
| [BBTools](https://hub.docker.com/r/staphb/bbtools/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bbtools)](https://hub.docker.com/r/staphb/bbtools) | <ul><li>[38.76](./bbtools/38.76/)</li><li>[38.86](./bbtools/38.86/)</li><li>[38.95](./bbtools/38.95/)</li><li>[38.96](./bbtools/38.96/)</li><li>[38.97](./bbtools/38.97/)</li><li>[38.98](./bbtools/38.98/)</li><li>[38.99](./bbtools/38.99/)</li><li>[39.00](./bbtools/39.00/)</li><li>[39.01](./bbtools/39.01/)</li><li>[39.06](./bbtools/39.06/)</li><li>[39.10](./bbtools/39.10/)</li></ul> | https://jgi.doe.gov/data-and-tools/bbtools/ |
| [bcftools](https://hub.docker.com/r/staphb/bcftools/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bcftools)](https://hub.docker.com/r/staphb/bcftools) | <ul><li>[1.10.2](./bcftools/1.10.2/)</li><li>[1.11](./bcftools/1.11/)</li><li>[1.12](./bcftools/1.12/)</li><li>[1.13](./bcftools/1.13/)</li><li>[1.14](./bcftools/1.14/)</li><li>[1.15](./bcftools/1.15/)</li><li>[1.16](./bcftools/1.16/)</li><li>[1.17](./bcftools/1.17/)</li><li>[1.18](bcftools/1.18/)</li><li>[1.19](./bcftools/1.19/)</li><li>[1.20](./bcftools/1.20/)</li><li>[1.20.c](./bcftools/1.20.c/)</li><li>[1.21](./bcftools/1.21/)</li></ul> | https://github.com/samtools/bcftools |
| [bedtools](https://hub.docker.com/r/staphb/bedtools/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bedtools)](https://hub.docker.com/r/staphb/bedtools) | <ul><li>2.29.2</li><li>2.30.0</li><li>[2.31.0](bedtools/2.31.0/)</li><li>[2.31.1](bedtools/2.31.1/)</li></ul> | https://bedtools.readthedocs.io/en/latest/ <br/>https://github.com/arq5x/bedtools2 |
| [berrywood-report-env](https://hub.docker.com/r/staphb/berrywood-report-env/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/berrywood-report-env)](https://hub.docker.com/r/staphb/berrywood-report-env) | <ul><li>1.0</li></ul> | none |
Expand Down
68 changes: 68 additions & 0 deletions bbtools/39.10/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
FROM staphb/samtools:1.21 as samtools
FROM staphb/htslib:1.21 as htslib

# As a reminder
# https://github.com/StaPH-B/docker-builds/pull/925#issuecomment-2010553275
# bbmap/docs/TableOfContents.txt lists additional dependencies

FROM ubuntu:jammy as app

ARG SAMBAMBAVER=1.0.1
ARG BBTOOLSVER=39.10

LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version="1"
LABEL software="BBTools"
LABEL software.version=${BBTOOLSVER}
LABEL description="A set of tools labeled as \"Bestus Bioinformaticus\""
LABEL website="https://sourceforge.net/projects/bbmap"
LABEL documentation="https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/"
LABEL license="https://jgi.doe.gov/disclaimer/"
LABEL maintainer="Abigail Shockey"
LABEL maintainer.email="[email protected]"
LABEL maintainer2="Padraic Fanning"
LABEL maintainer2.email="faninnpm AT miamioh DOT edu"

RUN apt-get update && \
apt-get install --no-install-recommends -y \
openjdk-8-jre-headless \
pigz \
pbzip2 \
lbzip2 \
bzip2 \
wget \
ca-certificates \
procps && \
rm -rf /var/lib/apt/lists/* && \
apt-get autoclean

# copy samtools to image
COPY --from=samtools /usr/local/bin/* /usr/local/bin/
COPY --from=htslib /usr/local/bin/* /usr/local/bin/

# download and install sambamba
RUN wget -q https://github.com/biod/sambamba/releases/download/v${SAMBAMBAVER}/sambamba-${SAMBAMBAVER}-linux-amd64-static.gz && \
gzip -d sambamba-${SAMBAMBAVER}-linux-amd64-static.gz && \
mv sambamba-${SAMBAMBAVER}-linux-amd64-static /usr/local/bin/sambamba && \
chmod +x /usr/local/bin/sambamba

# download and install bbtools
RUN wget -q https://sourceforge.net/projects/bbmap/files/BBMap_${BBTOOLSVER}.tar.gz && \
tar -xzf BBMap_${BBTOOLSVER}.tar.gz && \
rm BBMap_${BBTOOLSVER}.tar.gz && \
mkdir /data

ENV PATH=/bbmap/:$PATH \
LC_ALL=C

CMD tail -n 90 /bbmap/docs/TableOfContents.txt

WORKDIR /data

# testing
FROM app as test

# get test data and test one thing that uses samtools/sambamba
RUN wget -q https://raw.githubusercontent.com/StaPH-B/docker-builds/master/tests/SARS-CoV-2/SRR13957123.primertrim.sorted.bam && \
streamsam.sh in='SRR13957123.primertrim.sorted.bam' out='test_SRR13957123.primertrim.sorted.fastq.gz' && \
test -f test_SRR13957123.primertrim.sorted.fastq.gz
114 changes: 114 additions & 0 deletions bbtools/39.10/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# BBTools container

Main tool: [BBTools](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/)

Code repository: https://sourceforge.net/projects/bbmap/

Additional tools:
- samtools: 1.21
- htslib: 1.21
- sambamba: 1.0.1

Basic information on how to use this tool:
- executable: *.sh
- help: Program descriptions and options are shown when running the shell scripts with no parameters.
- version: --version
- description:
>BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving.
Additional information:
| Script | Purpose | Comment |
|-------------------------|----------------------------------------------------------------------------------|------------------------------------------------------------------------|
| bbcms.sh | Performs error correction using a Count-Min Sketch | Intended for metagenome assembly assembly |
| bbcountunique.sh | Counts unique kmers in reads | |
| bbduk.sh | Trims, filters or masks reads using kmers | |
| bbmap.sh | Splice-aware aligner for short reads | |
| bbmapskimmer.sh | BBMap version designed for high levels of multimapping | |
| bbmask.sh | Masks references based on various things, such as sequence complexity | |
| bbmerge.sh | Merges overlapping paired reads | |
| bbmerge-auto.sh | Same as bbmerge, but tries to allocate all memory on the node | Use this version for kmer operations like extend |
| bbnorm.sh | Normalizes reads based on coverage | Mainly for use prior to single-cell assembly |
| bbsplit.sh | BBMap version that maps to multiple references simultaneously | Intended for decontamination; similar to Seal |
| bbversion.sh | Prints the version of BBTools | |
| bbwrap.sh | Wraps BBMap to process many files using same reference | Saves time by loading the index only once |
| calctruequality.sh | Allows recalibration of quality scores from mapped reads | This generates the correction matrix; BBDuk does the recalibration |
| callgenes.sh | Fast prokaryotic gene caller | Integrated into BBSketch |
| callvariants.sh | Fast variant caller | |
| callvariants2.sh | Same as callvariants.sh with the "multisample" flag | |
| clumpify.sh | Shrinks compressed fastq files, and can remove duplicate reads | Also supports error correction |
| comparesketch.sh | Compares sketches locally, without using a sketch server | |
| crossblock.sh | Alias for decontaminate.sh | |
| cutgff.sh | Cuts out features defined by gff file | E.g, generates one fasta entry per gene from a gff and an assembly |
| cutprimers.sh | Cuts out subregions of ribosomes | Mainly for 16S analysis |
| decontaminate.sh | Pool-level decontamination for single-cell MDA-amplified genomes | |
| dedupe.sh | Removes duplicate and fully-contained sequences | Can also be used to cluster 16S sequences |
| dedupe2.sh | Version of dedupe that supports more hash keys for greater sensitivity | |
| dedupebymapping.sh | Deduplicates reads based on mapping coordinates | |
| demuxbyname.sh | Demultiplexes based on sequences headers | |
| filterbyname.sh | Filters based on sequence headers | |
| filterbytaxa.sh | Filters sequences based on taxonomic classification | Used with NCBI datasets |
| filterbytile.sh | Removes reads that are in low quality areas on flowcell | |
| filterqc.sh | Part of JGI's fastq filtering pipeline | |
| filtersam.sh | Filters sam files to remove reads with multiple unsupported mismatches | Designed for NovaSeq |
| gitable.sh | Used to process NCBI taxonomy data | |
| khist.sh | Alias for bbnorm.sh with flags for making a kmer frequency histogram | |
| kmercountexact.sh | Counts kmers and produces a histogram | Uses more memory than BBNorm but allows exact counts |
| kmercountmulti.sh | Cardinality estimation over multiple kmer lengths | Uses LogLog; does not produce a histogram |
| mapPacBio.sh | BBMap version designed for PacBio or Nanopore reads | Reads longer than 5kbp get broken into 5kbp shreds |
| mergesketch.sh | Allows multiple sketches to be combined | |
| msa.sh | Alignment tool | Used with cutprimers.sh to cut subsections out of 16s |
| mutate.sh | Generates synthetic genomes by randomly mutating the input | |
| muxbyname.sh | Multiplex multiple files, renaming sequences based on input file name | Opposite of demuxbyname.sh |
| partition.sh | Splits a sequence file into multiple files | |
| pileup.sh | Calculates coverage from sam files | |
| plotflowcell.sh | Produces statistics about flowcell positions | |
| processhi-c.sh | Custom trimming for hi-C reads | In development |
| randomreads.sh | Generates synthetic data from real genome reference | Highly customizable |
| readqc.sh | Short read quality report | Alternative to fastqc |
| reformat.sh | Converts sequence files to another format | Has many additional options, includes subsampling |
| rename.sh | Renames sequences in various ways, such as adding a prefix | |
| repair.sh | Fixes broken pairing in fastq files | |
| representative.sh | Makes a smaller subset of a reference dataset by eliminating redundancy | Designed for use with BBSketch output |
| rqcfilter2.sh | Filtering pipeline used at JGI | portal.nersc.gov/dna/microbial/assembly/bushnell/RQCFilterData.tar |
| seal.sh | Counts kmer matches between query and reference sequences | |
| sendsketch.sh | Fast taxonomic classifier using webservers at JGI | |
| shred.sh | Breaks sequences into shorter, fixed-length pieces | |
| shuffle.sh | Randomly reorders input file | Crashes if input doesn't fit in memory |
| shuffle2.sh | Randomly reorders input file | Supports larger files, but output might be less random |
| sketch.sh | Makes reference sketches on a per-TaxID basis | |
| sketchblacklist.sh | Makes sketch blacklists of common kmers | |
| sortbyname.sh | Sorts sequences by name, length, quality, taxa, and other things | |
| summarizequast.sh | Generates box plots for multiple quast reports | |
| tadpipe.sh | Preprocessing and assembly pipeline using tadpole | |
| tadpole.sh | Fast short read assembler | |
| tadwrapper.sh | Runs Tadpole with multiple kmer lengths to select the best assembly | |
| taxserver.sh | Starts taxonomy and sketch servers | |
| testformat.sh | Determines if file is fasta, fastq, interleaved, etc. by reading first few lines | |
| testformat2.sh | Generates extensive statistics by reading the full file | |
| translate6frames.sh | Translates nucleotide sequence into amino acid sequence in all frames | |
| vcf2gff.sh | Converts vcf format to gff format | |

Full documentation: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/

## Example Usage

(adapted from `/opt/bbmap/pipelines/covid/processCorona.sh`)

Interleave a pair of FASTQ files for downstream processing:

```text
reformat.sh \
in1=${SAMPLE}_R1.fastq.gz \
in2=${SAMPLE}_R2.fastq.gz \
out=${SAMPLE}.fastq.gz
```
Split into SARS-CoV-2 and non-SARS-CoV-2 reads:

```text
bbduk.sh ow -Xmx1g \
in=${SAMPLE}.fq.gz \
ref=REFERENCE.fasta \
outm=${SAMPLE}_viral.fq.gz \
outu=${SAMPLE}_nonviral.fq.gz \
k=25
```

0 comments on commit beee206

Please sign in to comment.