1 Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany.
2 GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany
3 Staatliche Naturwissenschaftliche Sammlungen Bayerns (SNSB)–Bayerische Staatssammlung für Paläontologie und Geologie, Munich, Germany
*corresponding author
Synteny, the shared arrangement of genes on chromosomes between related species, is a marker of shared ancestry, and synteny-breaking events can result in genomic incompatibilities between populations and ultimately lead to speciation events. Despite its pivotal role as a driver of speciation, the role of synteny breaks on speciation is poorly studied due to a lack of chromosome-level genome assemblies for a taxonomically broad sample of organisms. Here, using 22 con-generic animal genome pairs, we find a link between protein identity, microsynteny, and macrosynteny, but no evidence for a universal path of genomic change during speciation. We observed varied trajectories of synteny conservation relative to protein identity in non-model organisms, with many species’ pairs showing no karyotypic changes and others displaying large genomic rearrangements. This contrasts with previous studies on model organisms and indicates that the genomic changes preceding or resulting from speciation are likely very contextual between clades.
For each pair of genomes (congeneric species), microsynteny and macrosynteny are both analysed.
The pipeline processor run_synteny_analysis.py is coded in Python, and run simply as:
run_synteny_analysis.py -i species_pair_list.tab
For each species pair, for example the tuna, this begins with the scaffolds, proteins, and GFF downloaded from NCBI:
GCF_910596095.1_fThuMac1.1_genomic.fna.gz
GCF_910596095.1_fThuMac1.1_genomic.gff.gz
GCF_910596095.1_fThuMac1.1_protein.faa.gz
GCF_914725855.1_fThuAlb1.1_genomic.fna.gz
GCF_914725855.1_fThuAlb1.1_genomic.gff.gz
GCF_914725855.1_fThuAlb1.1_protein.faa.gz
and this generates the following files for each species:
- get_genbank_longest_isoforms.py filtered proteins with isoforms removed
.x.faa
, like:GCF_910596095.1_fThuMac1.1_protein.x.faa
andGCF_914725855.1_fThuAlb1.1_protein.x.faa
- get_genbank_longest_isoforms.py filtered GFFs corresponding to the proteins
.x.gff
, like:GCF_910596095.1_fThuMac1.1_genomic.x.gff
,GCF_914725855.1_fThuAlb1.1_genomic.x.gff
- DIAMOND results
fThuAlb1_vs_fThuMac1.blastp.tab
andfThuAlb1_vs_fThuMac1.renamed.blastp.tab
- scaffold_synteny.py results
fThuAlb1_vs_fThuMac1.scaffold_synteny.tab
andfThuAlb1_vs_fThuMac1.scaffold_synteny.pdf
- microsynteny.py results
fThuAlb1_vs_fThuMac1.microsynteny.tab
andfThuAlb1_vs_fThuMac1.microsynteny.pdf
- fastarenamer.py renamed versions of proteins for clustering
.x.n.faa
, like:GCF_910596095.1_fThuMac1.1_protein.x.n.faa
,GCF_914725855.1_fThuAlb1.1_protein.x.n.faa
- makehomologs.py clustering outputs
fasta_clusters.H.thunnus_clusters_v1.tab
clusters_thunnus_clusters_v1.tar.gz
and logthunnus_clusters_v1.2023-08-02-010624.mh.log
- alignment_conserved_site_to_dots.py accumulated tabular output
fThuAlb1_vs_fThuMac1.homologs_identity.tab
Subsequent processing occurs using several R scripts, for analysis and plotting.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.