GitHub - nrminor/E484T-visualizations: Halfmann and Minor et al. 2022 visualizations of a persistent SARS-CoV-2 infection that led to globally unique E484T mutation

Introduction

This workflow was created to efficiently gather, process, and visualize data presented in Halfmann and Minor et al. 2022, Evolution of a globally unique SARS-CoV-2 Spike E484T monoclonal antibody escape mutation in a persistently infected, immunocompromised individual. Raw reads from the course of the prolonged infection are pulled automatically from SRA BioProject PRJNA836936. Consensus sequences are bundled together with the workflow documents and are included in the project GitHub repository. Also included is a table of neutralization assay results and a table of qPCR cycle threshold (Ct) values from the infection.

Note that this workflow does not include processing behind Halfmann and Minor et al. 2022 supplemental tables and figures. These workflows are bundled separately and can be accessed by following the links in these GitHub repos:

Quick Start

If you already have Docker and NextFlow installed on your system, simply run the following command in the directory of your choice:

nextflow run dholab/E484T-visualizations -latest

This command automatically pulls the workflow from GitHub and runs it. If you do not have Docker and NextFlow installed, or want to tweak any of the default configurations in the workflow, proceed to the following sections.

Detailed Instructions

To run this workflow, start by running git clone to bring the workflow files into your directory of choice, like so:

git clone https://github.com/dholab/E484T-visualizations .

Once the workflow bundle is in place, you may need to double-check that the workflow scripts are executable by running chmod +x bin/* in the command line.

You will also need to install the Docker engine if you haven't already. The workflow pulls all the software it needs automatically from Docker Hub, which means you will never need to permanently install that software on your system. To install Docker, simply visit the Docker installation page at https://docs.docker.com/get-docker/.

Nextflow Installation

This workflow uses the NextFlow workflow manager. We recommend you install NextFlow to your system in one of the two following ways:

1) Installation with Conda

Install the miniconda python distribution, if you haven't already: https://docs.conda.io/en/latest/miniconda.html
Install the mamba package installation tool in the command line: conda install -y -c conda-forge mamba
Install Nextflow to your base environment: mamba install -c bioconda nextflow

2) Installation with curl

Run the following line in a directory where you'd like to install NextFlow, and run the following line of code: curl -fsSL https://get.nextflow.io | bash
Add this directory to your $PATH. If on MacOS, a helpful guide can be viewed here.

To double check that the installation was successful, type nextflow -v into the terminal. If it returns something like nextflow version 21.04.0.5552, you are set and ready to proceed.

Running the workflow

Now that you have cloned the workflow files, installed Docker, and installed NextFlow, you are ready to run the workflow. To do so, simply change into the workflow directory and run the following in the BASH terminal:

nextflow run prolonged_infection_workflow.nf

If the workflow runs partway, but a computer outage or other issue interrupts its progress, no need to start over! Instead, run:

nextflow run prolonged_infection_workflow.nf -resume

The workflow's configurations (see below) tell NextFlow to plot the workflow and record run statistics. However, the plot the workflow, note that NextFlow requires the package GraphViz, which is easiest to install via the intructions on GraphViz's website.

Finally, if you are running the workflow on macOS, you may run into memory issues if you have not allocated enough storage to the Docker Engine. If possible, we recommend you allocate 16GB of RAM to your Docker engine, though 6 GB, 8 GB, or 10 GB may also supply enough. At the time that we designed this workflow, this Stack Overflow answer is a good guide for increasing Docker memory.

Configuration

The following runtime parameters have been set for the whole workflow:

primer_key - path to an important CSV that lists which primer set was used for each sample
refdir - path to the directory where the reference sequence and primer BED files are located
refseq - path to the SARS-CoV-2 Wuhan-1 sequence from GenBank Accession MN9089473.3
refgff - path to the gene feature file for the SARS-CoV-2 Wuhan-1 sequence
Ct_data - path to the CSV file of Ct values
neut_data - path to the CSV file of neutralization assay data
results - path to the results directory, the workflow's default output location
results_data_files - path to a subdirectory of results/ where data files like VCFs are placed.
visuals - path to a subdirectory of results/ where graphics are stored. This is where the final figure PDFs are placed.

These parameters can be altered in the command line with a double-dash flag, like so:

nextflow run prolonged_infection_workflow.nf --results ~/Downloads

Bundled data

Bundled together with the workflow are:

primer BED files for ARTIC v3, ARTIC v4.1, and MIDNIGHT v1 are in ref/
the SARS-CoV-2 Wuhan-1 sequence from GenBank Accession MN9089473.3 is in ref/ and is called reference.fasta
the MN9089473.3 gene coordinate/codon GFF file is also in ref/ and is called MN9089473.gff3
a CSV file of qPCR cycle threshold values through the course of the prolonged infection is in data/
a CSV file of neutralization assay results is also in data/
a very important file called fig2b_raw_read_guide.csv is in resources/. This file enables the workflow to use the correct primers for each sample in samtools ampliconclip.

Workflow summary

the first process identifies the SARS-CoV-2 pango lineage for the consensus sequences from each timepoint
qPCR cycle threshold values are then plotted
each sequence from the FASTA of consensus sequences is isolated into its own FASTA file
the consensus FASTAs are converted to SAM format (mapped to the Wuhan-1 SARS-CoV-2 sequence)
a samtools mpileup is generated for each consensus SAM
the consensus pileups are piped into iVar, which variant calls each consensus sequence and also identifies protein effects
SRA accession numbers are taken from config/fig2b_raw_read_guide.csv and used to pull FASTQs for each sequencing timepoint. Because NCBI's fastq-dump is very slow, this step takes the longest. NOTE that because the settings for mapping these reads depend on sequencing platform (e.g., Illumina vs. Oxford Nanopore), there are near-identical processes laid out for reads from each sequencing platform.
the reads are mapped to the Wuhan-1 SARS-CoV-2 sequence
the mapped reads in SAM format are then clipped down to amplicons and converted to BAM format
a tool called covtobed then identifies regions in any of these BAMs where depth of coverage is less than 20 reads, indicating an amplicon dropout
the amplicon dropout BED files, the consensus sequence FASTA, and the iVar variant tables are then plotted with an R script
once Figure 2A has been plotted, all the FASTQ files, BAM files, BED files, and variant tables generated at intermediate steps of the workflow are stored and compressed in a TAR archive.
the neutralization assay results are plotted with an R script

Output files

data/reads/[timepoint_day_sequencingplatform].fastq.gz - Raw reads for each timepoint that we have deposited in Sequence Read Archive (SRA) BioProject PRJNA836936
data/[timepoint_day_sequencingplatform].sam - reads aligned to the SARS-CoV-2 Wuhan-1 sequence in human-readable SAM format
data/[timepoint_day_sequencingplatform].bam - reads aligned to the SARS-CoV-2 Wuhan-1 sequence, filtered down to only reads that are within the span of each amplicon in the timepoint's primer set, and converted to the smaller, non-human readable BAM format.
results/data/[timepoint_day_sequencingplatform].bed - BED files showing genome regions where fewer than 20 reads aligned. These low-depth regions are likely to be regions where amplification failed, i.e., amplicon dropouts.
results/data/[GISAID_strain_name]_consensus_variant_table.tsv - iVar tables of single-nucleotide variants identified in the consensus sequence of each timepoint, along with codon information and protein effects for each variant.
results/data/patient_variant_counts.csv - simple table recording the number of variants iVar identified at each timepoint. This is used as an input for the script that generates Supplementary Figure 2.
results/prolonged_infection_lineage_report.csv - pango lineages classified for each timepoint.
results/visuals/fig1a_ct_values_thru_time.pdf - PDF file of Figure 1A, which can be polished in vector graphics software like Adobe Illustrator.
results/visuals/fig2a_consensus_mutations.pdf - PDF file of Figure 2A, which can be polished in vector graphics software like Adobe Illustrator.
results/visuals/fig2c_neutralization_assay.pdf - PDF file of Figure 2C, which can be polished in vector graphics software like Adobe Illustrator.

Note that figures 1B and 2B are produced manually.

For more information

This workflow was created by Nicholas R. Minor. To report any issues, please visit the GitHub repository for this project.

NextFlow Learning Resources

NBI Sweden Reproducibility Workshop GitHub - This one is more up to date than the version on canva, which is the next link
NBI Sweden Reproducibility Workshop Canva
NextFlow Official Documentation
Exhaustive NextFlow Github.io Tutorial
Seqera Labs NextFlow Tutorial
Tronflow NextFlow tutorial
Google Cloud NextFlow tutorial
Data Carpentries Incubator Workshop on NextFlow - This one is especially thorough and helpful, and also introduces nf-core.
NextFlow Cheatsheet PDF - Not a tutorial, but this is a useful PDF to have on hand.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
bin		bin
config		config
data		data
resources		resources
results		results
.gitignore		.gitignore
README.md		README.md
nextflow.config		nextflow.config
prolonged_infection_dag.png		prolonged_infection_dag.png
prolonged_infection_workflow.nf		prolonged_infection_workflow.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Quick Start

Detailed Instructions

Nextflow Installation

1) Installation with Conda

2) Installation with curl

Running the workflow

Configuration

Bundled data

Workflow summary

Output files

For more information

NextFlow Learning Resources

About

Languages

nrminor/E484T-visualizations

Folders and files

Latest commit

History

Repository files navigation

Introduction

Quick Start

Detailed Instructions

Nextflow Installation

1) Installation with Conda

2) Installation with curl

Running the workflow

Configuration

Bundled data

Workflow summary

Output files

For more information

NextFlow Learning Resources

About

Resources

Stars

Watchers

Forks

Languages