-
Notifications
You must be signed in to change notification settings - Fork 16
7. Misc
We appreciate any critical comments or suggestions for improvements. Please raise issues or submit pull requests.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details.
This code was inspired mostly by work on bacterial symbionts in early stages of becoming intracellular and strictly host-associated. This ecological shift releases selection pressure ('use it or loose it') on many genes considered essential for free-living bacteria, so relatively recent symbionts can have over 50% of their genes pseudogenized.
Basic information about bacterial pseudogenes:
Recognizing the pseudogenes in bacterial genomes: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1142405/
Taking the pseudo out of pseudogenes: https://www.ncbi.nlm.nih.gov/pubmed/25461580
Several examples from the Sodalis clade showing how important is pseudogene annotation for bacteria in a nascent stage of symbiosis:
Mobile genetic element proliferation and gene inactivation impact over the genome structure and metabolic capabilities of Sodalis glossinidius, the secondary endosymbiont of tsetse flies: https://www.ncbi.nlm.nih.gov/pubmed/20649993
A novel human-infection-derived bacterium provides insights into the evolutionary origins of mutualistic insect–bacterial symbioses: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3499248/
Genome degeneration and adaptation in a nascent stage of symbiosis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3914690/
Repeated replacement of an intrabacterial symbiont in the tripartite nested mealybug symbiosis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5027413/
Large scale and significant expression from pseudogenes in Sodalis glossinidius - a facultative bacterial endosymbiont: https://www.biorxiv.org/content/early/2017/07/23/124388
There are several additional features we'll try to include in the script in the near future.
-
Include an optional FPKM cut-off when there are RNA-Seq data available.
-
Improve logic for ORFs on contig ends broken by assembly issues (e.g. metagenome-assembled genomes).
-
Check if the ORFs called as pseudogenes do not represent individual protein domains that can exist and evolve independently of the rest of the original multi-domain protein chain (PFAM?)
-
Fine tune pseudogene finding for mobile elements such as transposases.
-
Visualize results by a scatter plot of all genes/pseudogenes (dN/dS, GC content, expression, length ratio, ...).
-
Sometimes ORFs are predicted by mistake on the opposite strand or many additional spurious ORFs are predicted in GC-rich genomes (stop codons are AT-rich). Include an ORF filtering step and/or check regions with ORFS with no blastP hits by blastX. Include a proteomics validation step for hypothetical proteins.
Please suggest any additional features here: [https://github.com/filip-husnik/pseudofinder/issues].
Pseudofinder is developed by Mitch Syberg-Olsen1, Arkadiy Garber2, Patrick Keeling1, John McCutcheon2, and Filip Husnik3.
1 University of British Columbia, Vancouver, Canada
2 Arizona State University, Tempe, Arizona, USA
3 Okinawa Institute of Science and Technology, Okinawa, Japan
If it was useful for your work, please cite it as:
Syberg-Olsen MJ*, Graber AI*, Keeling PJ, McCutcheon JP, Husnik F. Pseudofinder: detection of pseudogenes in prokaryotic genomes, bioRxiv 2021, doi: https://doi.org/10.1101/2021.10.07.463580. GitHub repository: https://github.com/filip-husnik/pseudofinder/.
*Co-first authors.
Please also cite various dependencies used by Pseudofinder.