This document describes the output produced by the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
- append: The passed metadata to the pipeline appended to cluster addresses defined by the clustering component.
- ArborView: The ArborView visualization of a dendrogram alongside metadata.
- clusters: The identified clusters from the genomic_address_service.
- distances: Distances between genomes from profile_dists.
- merged: The merged MLST JSON files into a single MLST profiles file.
- pipeline_info: Information about the pipeline's execution
The IRIDA Next-compliant JSON output file will be named iridanext.output.json.gz
and will be written to the top-level of the results directory. This file is compressed using GZIP and conforms to the IRIDA Next JSON output specifications.
The pipeline is built using Nextflow and processes data using the following steps:
- Locidex merge - Merges MLST profile JSON files into a single profiles file.
- Profile dists - Computes pairwise distances between genomes using MLST allele differences.
- GAS mcluster - Generates a hierarchical cluster tree alongside cluster addresses.
- Append metadata - Appends the passed input metadata to the identified cluster addresses.
- ArborView - Generates a visualization of the cluster tree alongside metadata.
- IRIDA Next Output - Generates a JSON output file that is compliant with IRIDA Next
- Pipeline information - Report metrics generated during the workflow execution
Output files
merged/
- Merged MLST profiles:
profile.tsv
- Merged MLST profiles:
Output files
distances/
- Mapping allele identifiers to integers:
allele_map.json
. For example:{ "l1": { "60b725f10c9c85c70d97880dfe8191b3": 1 }, "l2": { "60b725f10c9c85c70d97880dfe8191b3": 1 }, "l3": { "3b5d5c3712955042212316173ccf37be": 1, "60b725f10c9c85c70d97880dfe8191b3": 2 } }
- The query MLST profiles:
query_profile.text
- The reference MLST profiles:
ref_profile.text
- The computed distances based on MLST allele differences:
results.text
- Information on the profile_dists run:
run.json
- Mapping allele identifiers to integers:
Output files
clusters/
- The computed cluster addresses:
clusters.text
- Information on the GAS mcluster run:
run.json
- Thesholds used to compute cluster addresses:
thresholds.json
- Hierarchical clusters as a newick file:
tree.nwk
- The computed cluster addresses:
Output files
append/
- The passed input metadata columns appended to the cluster addresses file:
clusters_and_metadata.tsv
- The passed input metadata columns appended to the cluster addresses file:
Output files
ArborView/
- The ArborView visualization of clusters and metadata:
clustered_data_arborview.html
- The ArborView visualization of clusters and metadata:
Output files
/
- IRIDA Next-compliant JSON output:
iridanext.output.json.gz
- IRIDA Next-compliant JSON output:
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.yml
. Thepipeline_report*
files will only be present if the--email
/--email_on_fail
parameter's are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
. - Parameters used by the pipeline run:
params.json
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.