Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding virulencefinder version 3.0.0 #927

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

erinyoung
Copy link
Contributor

There's a new version of virulencefinder!

Admittedly, I am unclear what is new in this version, but I'm sure there are some bug fixes or new features.

When I went about updating the Dockerfile, I did attempt to update the base image to ubuntu:jammy. This resulted in python incompatibilities issues (ubuntu:jammy's default python 3 is 3.10) among other errors.

Therefore, I kept everything in the prior dockerfile and updated the software version, database version, and kma version. I also added a CMD line and changed the tabs.

The differences between 2.0.4 and 2.0.5:

$ diff virulencefinder/2.0.4/Dockerfile virulencefinder/2.0.5/Dockerfile 
1,4c1,4
< ARG VIRULENCEFINDER_VER="2.0.1"
< # Database not properly versioned, so using most recent commit made on 2023-05-03
< # see here: https://bitbucket.org/genomicepidemiology/virulencefinder_db/commits/f678bdc15283aed3a45f66050d2eb3a6c9651f3f
< ARG VIRULENCEFINDER_DB_COMMIT_HASH="f678bdc15283aed3a45f66050d2eb3a6c9651f3f"
---
> ARG VIRULENCEFINDER_VER="2.0.5"
> # Database not properly versioned, so using most recent commit made on 2024-01-02
> # see here: https://bitbucket.org/genomicepidemiology/virulencefinder_db/commits/2b705359191a24f6db64f891ab07c93b0281e685
> ARG VIRULENCEFINDER_DB_COMMIT_HASH="2b705359191a24f6db64f891ab07c93b0281e685"
10a11
> ARG KMA_VER="1.4.14"
24c25
< # ncbi-blast+ v2.9.0 (ubuntu:focal), min required version is 2.8.1
---
> # ncbi-blast+ v2.9.0-2 (ubuntu:focal), min required version is 2.8.1
27,41c28,42
<  wget \
<  ca-certificates \
<  procps \
<  git \
<  ncbi-blast+ \
<  python3 \
<  python3-pip \
<  python3-setuptools \
<  python3-dev \
<  gcc \
<  make \
<  libz-dev \
<  dos2unix \
<  unzip && \
<  apt-get autoclean && rm -rf /var/lib/apt/lists/*
---
>     wget \
>     ca-certificates \
>     procps \
>     git \
>     ncbi-blast+ \
>     python3 \
>     python3-pip \
>     python3-setuptools \
>     python3-dev \
>     gcc \
>     make \
>     libz-dev \
>     dos2unix \
>     unzip && \
>     apt-get autoclean && rm -rf /var/lib/apt/lists/*
48,51c49,52
< RUN git clone --branch 1.0.1 --depth 1 https://bitbucket.org/genomicepidemiology/kma.git && \
<  cd kma && \
<  make && \
<  mv -v kma* /usr/local/bin/
---
> RUN git clone --branch ${KMA_VER} --depth 1 https://bitbucket.org/genomicepidemiology/kma.git && \
>     cd kma &&\
>     make &&\
>     mv kma kma_index kma_shm kma_update /usr/local/bin/
58,62c59,63
<  git clone https://bitbucket.org/genomicepidemiology/virulencefinder_db.git /database && \
<  cd /database && \
<  git checkout ${VIRULENCEFINDER_DB_COMMIT_HASH} && \
<  dos2unix *.fsa && \
<  python3 INSTALL.py kma_index
---
>     git clone https://bitbucket.org/genomicepidemiology/virulencefinder_db.git /database && \
>     cd /database && \
>     git checkout ${VIRULENCEFINDER_DB_COMMIT_HASH} && \
>     dos2unix *.fsa && \
>     python3 INSTALL.py kma_index
66c67
<  mkdir /data
---
>     mkdir /data
70c71
<  LC_ALL=C.UTF-8
---
>     LC_ALL=C.UTF-8
80a82,83
> RUN virulencefinder.py -h
> 
89,92c92,95
<  wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/224/845/GCA_012224845.2_ASM1222484v2/GCA_012224845.2_ASM1222484v2_genomic.fna.gz && \
<  gunzip GCA_012224845.2_ASM1222484v2_genomic.fna.gz && \
<  virulencefinder.py -i /test/GCA_012224845.2_ASM1222484v2_genomic.fna -x -o /test/asm-input && \
<  cat /test/asm-input/results_tab.tsv
---
>     wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/224/845/GCA_012224845.2_ASM1222484v2/GCA_012224845.2_ASM1222484v2_genomic.fna.gz && \
>     gunzip GCA_012224845.2_ASM1222484v2_genomic.fna.gz && \
>     virulencefinder.py -i /test/GCA_012224845.2_ASM1222484v2_genomic.fna -x -o /test/asm-input && \
>     cat /test/asm-input/results_tab.tsv
96,98c99,101
<  wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR690/006/SRR6903006/SRR6903006_1.fastq.gz && \
<  virulencefinder.py -i SRR6903006_1.fastq.gz -mp kma -x -o /test/reads-input && \
<  cat /test/reads-input/results_tab.tsv
---
>     wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR690/006/SRR6903006/SRR6903006_1.fastq.gz && \
>     virulencefinder.py -i SRR6903006_1.fastq.gz -mp kma -x -o /test/reads-input && \
>     cat /test/reads-input/results_tab.tsv
104,105c107,108
<  virulencefinder.py -i test.fsa -o . -mp blastn -x -q && \
<  virulencefinder.py --help
---
>     virulencefinder.py -i test.fsa -o . -mp blastn -x -q && \
>     virulencefinder.py --help

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The dockerfile successfully builds to a test target for the user creating the PR. (i.e. docker build --tag samtools:1.15test --target test docker-builds/samtools/1.15 )
  • Directory structure as name of the tool in lower case with special characters removed with a subdirectory of the version number (i.e. spades/3.12.0/Dockerfile)
    • (optional) All test files are located in same directory as the Dockerfile (i.e. shigatyper/2.0.1/test.sh)
  • Create a simple container-specific README.md in the same directory as the Dockerfile (i.e. spades/3.12.0/README.md)
    • If this README is longer than 30 lines, there is an explanation as to why more detail was needed
  • Dockerfile includes the recommended LABELS
  • Main README.md has been updated to include the tool and/or version of the dockerfile(s) in this PR
  • Program_Licenses.md contains the tool(s) used in this PR and has been updated for any missing

@kapsakcj
Copy link
Collaborator

I betcha the database has been updated with new/recently designated shiga toxin subtypes (and maybe other updates too)

@kapsakcj kapsakcj self-requested a review March 21, 2024 17:25
@kapsakcj
Copy link
Collaborator

I won't be able to review this week, but don't let that stop others from reviewing if they have the time

@erinyoung erinyoung marked this pull request as draft April 9, 2024 23:45
@erinyoung
Copy link
Contributor Author

I went to check on this PR, and it looks like there are some changes being made to the virulencefinder_db (https://bitbucket.org/genomicepidemiology/virulencefinder_db/commits/). I want to give those some time to settle.

@erinyoung
Copy link
Contributor Author

I've updated the database commit hash. It looks like the last commit was at the beginning of April, so this is probably safe to review now.

@erinyoung erinyoung marked this pull request as ready for review May 14, 2024 21:46
@erinyoung erinyoung requested review from kapsakcj and removed request for kapsakcj June 25, 2024 16:06
@erinyoung erinyoung changed the title adding virulencefinder version 2.0.5 adding virulencefinder version 3.0.0 Jun 25, 2024
@erinyoung
Copy link
Contributor Author

@kapsakcj , it appears that version 3.0.0 came out the other day. It's similar to resfinder.

I've changed the tool specific readme drastically. Let me know if something is unclear.

@kapsakcj
Copy link
Collaborator

kapsakcj commented Oct 12, 2024

Sorry for taking so dang long to finally review. I'm making a few changes to the dockerfile to improve some aspects.

According to this file: https://bitbucket.org/genomicepidemiology/virulencefinder/src/master/pyproject.toml

biopython 1.79 or higher is installed, cgecore 1.5.6 is installed, and tabulate 0.8.9 are installed, so I'm going to test out removing the pip3 install ... on line 51 since I believe the pip install . on line 76 will install those python packages.

EDIT: yep, that line was unnecessary and confusing so I removed it. those python packages are installed alongside virulencefinder

@kapsakcj
Copy link
Collaborator

@erinyoung OK I believe I'm done making changes. Here's the updated diff with -w flag to ignore whitespace changes so it's a little less busy:

$ diff -w virulencefinder/2.0.4/Dockerfile virulencefinder/3.0.0/Dockerfile
1,4c1,5
< ARG VIRULENCEFINDER_VER="2.0.1"
< # Database not properly versioned, so using most recent commit made on 2023-05-03
< # see here: https://bitbucket.org/genomicepidemiology/virulencefinder_db/commits/f678bdc15283aed3a45f66050d2eb3a6c9651f3f
< ARG VIRULENCEFINDER_DB_COMMIT_HASH="f678bdc15283aed3a45f66050d2eb3a6c9651f3f"
---
> ARG VIRULENCEFINDER_VER="3.0.0"
> ARG VIRULENCEFINDER_DB_VER="2.0.0"
> # Database sometimes is not properly versioned, so using most recent commit made on 2024-04-06 would be something like
> # see here: https://bitbucket.org/genomicepidemiology/virulencefinder_db/commits/bcf7f0b26271a59ca85715fa2ab8a0c380e5357b
> # ARG VIRULENCEFINDER_DB_COMMIT_HASH="bcf7f0b26271a59ca85715fa2ab8a0c380e5357b"
6c7
< FROM ubuntu:focal as app
---
> FROM ubuntu:jammy AS app
10a12,13
> ARG VIRULENCEFINDER_DB_VER
> ARG KMA_VER="1.4.15"
13c16
< LABEL base.image="ubuntu:focal"
---
> LABEL base.image="ubuntu:jammy"
18c21
< LABEL website="https://bitbucket.org/genomicepidemiology/virulencefinder/src/master/"
---
> LABEL website="https://bitbucket.org/genomicepidemiology/virulencefinder"
21a25,26
> LABEL maintainer1="Erin Young"
> LABEL maintainer1.email="[email protected]"
24,25c29,30
< # ncbi-blast+ v2.9.0 (ubuntu:focal), min required version is 2.8.1
< # python3 v3.8.10, min required version is 3.5
---
> # ncbi-blast+ v2.12.0 (ubuntu:jammy), min required version is 2.8.1
> # python3 v3.10.12, min required version is 3.10
40,44c45,48
<  unzip && \
<  apt-get autoclean && rm -rf /var/lib/apt/lists/*
<
< # install python dependencies
< RUN pip3 install biopython==1.73 tabulate==0.7.7 cgecore==1.5.5
---
>     unzip \
>     python-is-python3 && \
>     apt-get autoclean && rm -rf /var/lib/apt/lists/* && \
>     update-alternatives --install /usr/bin/python python /usr/bin/python3 10
48c52
< RUN git clone --branch 1.0.1 --depth 1 https://bitbucket.org/genomicepidemiology/kma.git && \
---
> RUN git clone --branch ${KMA_VER} --depth 1 https://bitbucket.org/genomicepidemiology/kma.git && \
51c55
<  mv -v kma* /usr/local/bin/
---
>     mv kma kma_index kma_shm kma_update /usr/local/bin/
53c57
< # download VIRULENCEFINDER database using a specific commit hash to aid in reproducibility
---
> # download VIRULENCEFINDER database
55c59
< # NOTE: files HAVE to go into '/database' since that is the default location expected by serotyperfinder.py
---
> # NOTE: files HAVE to go into '/database' since that is the default location expected by virulencefinder
58,60c62,65
<  git clone https://bitbucket.org/genomicepidemiology/virulencefinder_db.git /database && \
<  cd /database && \
<  git checkout ${VIRULENCEFINDER_DB_COMMIT_HASH} && \
---
>     git clone --depth 1 https://bitbucket.org/genomicepidemiology/virulencefinder_db.git /databases && \
>     cd /databases && \
>     git fetch --depth 1 origin tag ${VIRULENCEFINDER_DB_VER} && \
>     rm -rf .git && \
65c70,73
< RUN git clone --branch ${VIRULENCEFINDER_VER} https://bitbucket.org/genomicepidemiology/virulencefinder.git && \
---
> RUN git clone --branch ${VIRULENCEFINDER_VER} --depth 1 https://bitbucket.org/genomicepidemiology/virulencefinder.git && \
>     rm -rf /virulencefinder/.git && \
>     cd /virulencefinder && \
>     pip3 install . && \
70c78,80
<  LC_ALL=C.UTF-8
---
>     LC_ALL=C.UTF-8 \
>     CGE_BLASTN=/usr/bin/blastn \
>     CGE_VIRULENCEFINDER_DB=/databases
74a85,91
> # force bash shell so below lines to make an alias runs properly
> SHELL ["/bin/bash", "-c"]
>
> # setting a janky alias for everyone that uses the "latest" tag
> RUN echo -e '#!/bin/bash\npython -m virulencefinder "$@"' > /usr/bin/virulencefinder.py && \
>     chmod +x /usr/bin/virulencefinder.py
>
76c93
< CMD [ "virulencefinder.py", "-h"]
---
> CMD [ "python", "-m", "virulencefinder", "-h" ]
79c96,98
< FROM app as test
---
> FROM app AS test
>
> RUN python -m virulencefinder -h && /usr/bin/virulencefinder.py -h
88,89c107,108
< RUN mkdir -v /test/asm-input && \
<  wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/224/845/GCA_012224845.2_ASM1222484v2/GCA_012224845.2_ASM1222484v2_genomic.fna.gz && \
---
> RUN mkdir asm-input && \
>     wget -q https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/224/845/GCA_012224845.2_ASM1222484v2/GCA_012224845.2_ASM1222484v2_genomic.fna.gz && \
91,92c110,115
<  virulencefinder.py -i /test/GCA_012224845.2_ASM1222484v2_genomic.fna -x -o /test/asm-input && \
<  cat /test/asm-input/results_tab.tsv
---
>     python -m virulencefinder -h && \
>     which blastn && \
>     head -n 5 /test/GCA_012224845.2_ASM1222484v2_genomic.fna && \
>     python -m virulencefinder -ifa /test/GCA_012224845.2_ASM1222484v2_genomic.fna --extented_output -o asm-input && \
>     ls asm-input && \
>     cat asm-input/results_tab.tsv
96,97c119,121
<  wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR690/006/SRR6903006/SRR6903006_1.fastq.gz && \
<  virulencefinder.py -i SRR6903006_1.fastq.gz -mp kma -x -o /test/reads-input && \
---
>     wget -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR690/006/SRR6903006/SRR6903006_1.fastq.gz && \
>     wget -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR690/006/SRR6903006/SRR6903006_2.fastq.gz && \
>     python -m virulencefinder -ifq SRR6903006_1.fastq.gz SRR6903006_2.fastq.gz --extented_output -o /test/reads-input && \
103,105c127,129
< RUN cd /virulencefinder/test && \
<  virulencefinder.py -i test.fsa -o . -mp blastn -x -q && \
<  virulencefinder.py --help
---
> RUN cd /virulencefinder/tests && \
>     python -m virulencefinder -ifa data/test.fsa -o . && \
>     ls

@kapsakcj
Copy link
Collaborator

Tests pass after making my changes so I'm good to merge if you are @erinyoung . Let me know and I'll merge and deploy

The only other improvements I would make are to use a builder stage so we can remove the compiling stuff make, gcc, libz-dev to make the image a little bit smaller but the juice isn't worth the squeeze in my opinion. 772MB uncompressed image isn't too bad:

$ docker images
REPOSITORY                       TAG              IMAGE ID       CREATED          SIZE
eriny/virulencefinder            3.0.0-full       503e54565217   8 minutes ago    1.01GB
eriny/virulencefinder            3.0.0-app-only   8ddbe571cd38   11 minutes ago   772MB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants