Compare genes across samples and generate phylogenetic tree based on most diverse genes.
the inputs are listed at the top:
- file (txt or similar) containing list of fasta files (assemblies) to be used
- extension of fasta files (fna or fasta)
- directory with the assemblies
- directory with corresponding gff files
- minimum identity (%) between two genes to consider them homologues
- minimum lenght difference (% of total length) to consider two genes as homologues
Both 5 and 6 must be met.
The most diverse (based on Shannon entropy) genes are taken for construction of phylogenetic tree.
The required programs are mafft, IQTREE and BLAST