Skip to content
This repository has been archived by the owner on Sep 30, 2022. It is now read-only.

Latest commit

 

History

History
29 lines (18 loc) · 2.46 KB

README.md

File metadata and controls

29 lines (18 loc) · 2.46 KB

GenomeFISH

Scripts are contained in the scripts directory and are written in Python 2.x. No installation is required. Simply copy the scripts to your system and ensure you have the required dependencies as detailed below.

Calculating pairwise average nucleotide identity

Script: ani.py

Requirements:

Calculates pairwise average nucleotide identity (ANI) values between a set of genomes:

ani.py <gene_dir> <output_dir>

where <gene_dir> contains FASTA files of genes in nucleotide space for two or more genomes and <output_dir> is the desired output directory for results. FASTA files must end with a '.fna' or '.fasta' extension. The number of CPUs to use can be specified with the "--threads" parameter.

Simulating in silico probes

Script: in_silico_probes.py

Requirements:

  • biolib Python library
  • BLASTn >= 2.6.0+ must be on your system path
  • melt.pl from the UNAFold software library must be on your system path

Calculates the percent identity and free energy error between in silico probes from a reference genome and a target genome:

in_silico_probes.py <genome_dir> <ani_matrix> <output_dir>

where <genome_dir> contains genomic FASTA files for two or more genomes, <ani_matrix> is a file with pairwise average nucleotide identity between genomes (see: ani.py), and <output_dir> is the desired output directory for results. The genomic FASTA files must end in a '.fasta' or '.fna' extension and all pairwise combination of genomes are considered as reference and target genomes. In order to reduce computational requirements in silico probes are only simulated every 120 bp (i.e. same length as the probe). At this step size it takes ~2 hours to compare a pair of genomes. You can do multiple genome comparisons in parallel though so this should help a bit. The length of the probes can be changed with the "--probe_size" parameter and the spacing between probes changed with the "--probe_step_size" parameters. The number of CPUs to use can be specified with the "--threads" parameter. Using multiple CPUs will substantially reduce processing time when multiple pairs of genomes are being processed. For other optional parameters see the command line help (i.e. in_silico_probes.py -h).