Skip to content

v1.2.0

Compare
Choose a tag to compare
@frankambrosio3 frankambrosio3 released this 03 Oct 14:02
· 294 commits to main since this release
801baa2

Public Health Bioinformatics v1.2.0 Release Notes

This minor release introduces three new workflows and resolves various bugs.

New workflows:

  • TheiaMeta_Illumina_PE_PHB
    This workflow offers a versatile approach to de novo metagenomic assembly, providing the option to use either reference-based or reference-independent metagenomic assembly. Taxonomic characterization is also performed with Kraken2.

  • CZGenEpi_Prep_PHB
    The CZGenEpi_Prep workflow formats metadata and assembly files for seamless integration with the Chan Zuckerberg GEN EPI platform.

  • Samples_to_Ref_Tree_PHB
    In this workflow, Nextclade is used to rapidly place new samples onto an existing reference phylogenetic tree. Phylogenetic placement is done by comparing the mutations of the query sequence (relative to the reference) with the mutations of every node and tip in the reference tree, and finding the node which has the most similar set of mutations. This operation is repeated for each query sequence, until all of them are placed onto the tree.

Changes in existing workflows

  • Kraken2_SE_PHB
    Kraken2 output files were not being correctly identified by the single-end standalone workflow, causing it to fail unexpectedly Output files should now populate on the Terra datatable correctly.

  • KMC
    The output type of est_genome_size is now an int so data can be sorted numerically in a Terra datatable when running TheiaProk_ONT. Additionally, this task no longer runs unnecessarily for the TheiaCoV_ONT workflow.

  • TS_MLST
    The database had been updated as of August 2023.

    New outputs:

    • ts_mlst_docker

Mycobacterium tuberculosis changes

  • TBProfiler
    The default variant caller has been adjusted to FreeBayes to accurately identify resistance-conferring deletions and multi-nucleotide variants (MNVs),

  • tbp-parser
    A TBProfiler parsing module has been added to apply variant interpretation logic based on recommendations by the WHO, CDC and CDPH to produce antitubercular drug resistance calls. Additionally, a set of machine and human-interpretable files are produced to facilitate data sharing and interpretation. Find the source code here.

    New inputs:

    • tbprofiler_output_seq_method_type (default="WGS")
    • tbprofiler_operator (default="")
    • tbp_parser_min_depth (default=10)
    • tbp_parser_coverage_threshold (default=100)
    • tbp_parser_debug (default=false)
    • tbp_parser_docker_image (default="us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.0.1")

    New outputs:

    • tbprofiler_lims_report_csv
    • tbprofiler_looker_csv
    • tbprofiler_laboratorian_report_csv
    • tbprofiler_resistance_genes_percent_coverage
    • tbp_parser_genome_percent_coverage
    • tbp_parser_version
    • tbp_parser_docker
  • Clockwork
    The clockwork module has been added to decontaminate read files of sequencing data that may come from a nontuberculous mycobacteria (NTM) or human genome.

    New outputs:

    • clockwork_decontaminated_read1
    • clockwork_decontaminated_read2
  • TBDB
    The TBProfiler module uses a database called TBDB. We have modified the code to allow for custom databases to be used in place of the default TBDB. Additionally, we have created a custom database including mutations from TBDB, the WHO catalog, and a list of mutations included in the CDC's MTB pipeline Varpipe.

    By default, TBProfiler runs with the default database. If the Boolean input tbprofiler_run_custom_db is set to true and no database is provided by the user, a database containing both TBProfiler's TBDB and CDC Varpipe's collection of resistance conferring mutations will be used by TBProfiler. In this database, the duplicate entries have been manually curated by removing the TBDB entry in favor of Varpipe's mutation annotation.

    New inputs:

    • tbprofiler_run_custom_db (default=false)
    • tbprofiler_custom_db (default="gs://theiagen-public-files/terra/theiaprok-files/tbdb_varpipe_combined.tar.gz")

Bug Fixes

  • In the KMC task, the -n flag has been added to the echo command to avoid newline characters
  • An optional snippy_core_bed file input has been added to the Snippy_Tree workflow to enable site masking, and thereby exposing this optional input to the Snippy_Streamline workflow.
  • The memory input for quast has been adjusted to match the style guide in TheiaEuk_Illumina_PE_PHB workflow.
  • The version_capture task now uses a Docker image hosted on Theiagen's Google Artifact Registry (GAR) instead of DockerHub; we also exposed docker as an optional input for this task.
  • The plasmidfinder output parsing was overambitious when removing duplicates and removed every instance of a duplicate, instead of just one. This has been resolved.

What's Changed

Full Changelog: v1.1.0...v1.2.0

View our documentation here!