5. Commands

Annotate

Annotate is Pseudofinder's core command. Calling this command will identify pseudogene candidates in the input genome annotation, and will produce various output files, explained in detail below.

As with any other python script, there are two ways how to run it:

# Call it directly with python3 (or just python if python3 is your default).
python3 pseudofinder.py

# Or make the file executable and then rely on its shebang line [#!/usr/bin/env python3].
chmod u+x ./pseudofinder.py

Providing input files:

# Run the full pipeline on 16 processors (for BlastX/BlastP searches).
# Unless you have a $BLASTDB environmental variable set on your system, you have to provide a full path to the NR database.
python3 pseudofinder.py annotate --genome GENOME.GBF --outprefix PREFIX --database /PATH/TO/NR/nr --threads 16

Output of Annotate:

Every run will produce the following files:

File	Description
[prefix]_interactive.html	Interactive plots which summarize the genome-wide analysis.
[prefix]_intact.gff	Intact genes in GFF3 format.
[prefix]_intact.faa	Intact genes in fasta format.
[prefix]_intergenic.fasta	Intergenic regions in fasta format.
[prefix]_blastX_output.tsv	Tab-delimited output of BLASTX run on intergenic regions.
[prefix]_log.txt	Summary of all inputs, outputs, parameters and results.
[prefix]_map.pdf	Concatenated chromosome map. Input genes appear on the inner track in blue, and candidate pseudogenes are shown in red on the outer track.
[prefix]_proteome.faa	All protein sequences in fasta format.
[prefix]_blastP_output.tsv	Tab-delimited output of BLASTP run on proteome.
[prefix]_pseudos.gff	Candidate pseudogenes in GFF3 format.
[prefix]_pseudos.fasta	Candidate pseudogenes in fasta format.

If you include a reference genome, the run will also produce:

File	Description
[prefix]_interactive_dnds.html	Interactive genome-wide dN/dS plot.
[prefix]_dnds	Directory containing output from the dnds module: BLAST results, dN/dS summary file, and a folder containing the nucleotide, amino acids, and codon alignments that were used to calculate dN and dS values.

The interactive plot is a good place to start engaging with your data. Here you will find a summary of all data collected for each feature on the genome and if you hover over an individual feature, the popup will give you a more detailed look. Red bars indicate features which have been flagged as pseudogenes, and the popup will tell you specifically what kind of pseudogene. alt text

Sleuth

The sleuth command will compare a genome against another closely-related genome. After homologous genes are identified, this module runs PAML on aligned genes to generate codon alignments and calculate per-gene dN/dS values. These dN/dS values can be used to infer neutral selection and potential cryptic pseudogenes. This module can be invoked within the Annotate command by providing a closely-related reference genome using the -ref flag.

Usage:

# Call within annotate
python3 pseudofinder.py annotate --genome GENOME.GBF --reference REFERENCE.GBF --outprefix PREFIX --database /PATH/TO/NR/nr --threads 16 

# Stand alone dN/dS calcuation
pseudofinder.py sleuth -a GENOME_PROTS -n GENOME_GENES -ra REFERENCE-PROTS -rn REFERENCE_GENES

Whenever the sleuth module is invoked through the annotate command (use annotate with the --reference flag), an interactive dN/dS plot will automatically be generated. This plot is helpful to explore your data and refine your chosen parameters for determining pseudogenes. The plot will include a linear regression, a line indicating the chosen dN/dS cutoff (--max_dnds), and the calcuated genome-wide mean dN/dS value.

alt text

Reannotate

Reannotate will run the annotate workflow, beginning after the computationally intensive BLAST and codon alignment steps. This command can very quickly reannotate pseudogenes if you would like to change any downstream parameters. The log file from the previous run will be parsed for previous parameters and files, so please keep the files in the locations described in the log file.

Usage:

pseudofinder.py reannotate -g GENOME -log LOGFILE -op OUTPREFIX

Visualize

One strength of Pseudofinder is its ability to be fine-tuned to the user's preferences. To help visualize the effects of changing the parameters of this program, we have provided the visualize command. This command will display how many pseudogenes will be detected based on any combination of --length_pseudo and --shared_hits. Similar to the reannotate module, the log file will be parsed for information about relevant files and parameters.

Usage:

pseudofinder.py visualize -g GENOME -log LOGFILE -op OUTPREFIX

alt text

Test

With a single command, the entire Pseudofinder workflow can be run on the 139 kbp genome of Candidatus Tremblaya princeps strain PCIT (or optionally, you may provide your own genome).

Simply enter the following command:

python3 pseudofinder.py test --database /PATH/TO/NR/nr

The workflow will begin immediately and write the results to a timestamped folder found in /pseudo-finder/test/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5. Commands

Annotate

Sleuth

Reannotate

Visualize

Test

Break

Clone this wiki locally