Skip to content

7. Misc

Mitch Syberg-Olsen edited this page Feb 23, 2022 · 1 revision

Contributing

We appreciate any critical comments or suggestions for improvements. Please raise issues or submit pull requests.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details.

Acknowledgements

This code was inspired mostly by work on bacterial symbionts in early stages of becoming intracellular and strictly host-associated. This ecological shift releases selection pressure ('use it or loose it') on many genes considered essential for free-living bacteria, so relatively recent symbionts can have over 50% of their genes pseudogenized.

References

Basic information about bacterial pseudogenes:

Recognizing the pseudogenes in bacterial genomes: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1142405/

Taking the pseudo out of pseudogenes: https://www.ncbi.nlm.nih.gov/pubmed/25461580

Several examples from the Sodalis clade showing how important is pseudogene annotation for bacteria in a nascent stage of symbiosis:

Mobile genetic element proliferation and gene inactivation impact over the genome structure and metabolic capabilities of Sodalis glossinidius, the secondary endosymbiont of tsetse flies: https://www.ncbi.nlm.nih.gov/pubmed/20649993

A novel human-infection-derived bacterium provides insights into the evolutionary origins of mutualistic insect–bacterial symbioses: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3499248/

Genome degeneration and adaptation in a nascent stage of symbiosis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3914690/

Repeated replacement of an intrabacterial symbiont in the tripartite nested mealybug symbiosis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5027413/

Large scale and significant expression from pseudogenes in Sodalis glossinidius - a facultative bacterial endosymbiont: https://www.biorxiv.org/content/early/2017/07/23/124388

Wish list

There are several additional features we'll try to include in the script in the near future.

  1. Include an optional FPKM cut-off when there are RNA-Seq data available.

  2. Improve logic for ORFs on contig ends broken by assembly issues (e.g. metagenome-assembled genomes).

  3. Check if the ORFs called as pseudogenes do not represent individual protein domains that can exist and evolve independently of the rest of the original multi-domain protein chain (PFAM?)

  4. Fine tune pseudogene finding for mobile elements such as transposases.

  5. Visualize results by a scatter plot of all genes/pseudogenes (dN/dS, GC content, expression, length ratio, ...).

  6. Sometimes ORFs are predicted by mistake on the opposite strand or many additional spurious ORFs are predicted in GC-rich genomes (stop codons are AT-rich). Include an ORF filtering step and/or check regions with ORFS with no blastP hits by blastX. Include a proteomics validation step for hypothetical proteins.

Please suggest any additional features here: [https://github.com/filip-husnik/pseudofinder/issues].

Citing Pseudofinder

Pseudofinder is developed by Mitch Syberg-Olsen1, Arkadiy Garber2, Patrick Keeling1, John McCutcheon2, and Filip Husnik3.

1 University of British Columbia, Vancouver, Canada

2 Arizona State University, Tempe, Arizona, USA

3 Okinawa Institute of Science and Technology, Okinawa, Japan

If it was useful for your work, please cite it as:

Syberg-Olsen MJ*, Graber AI*, Keeling PJ, McCutcheon JP, Husnik F. Pseudofinder: detection of pseudogenes in prokaryotic genomes, bioRxiv 2021, doi: https://doi.org/10.1101/2021.10.07.463580. GitHub repository: https://github.com/filip-husnik/pseudofinder/.

*Co-first authors.

Please also cite various dependencies used by Pseudofinder.