Annotate domains in refpkg profile HMMs #77

cmorganl · 2021-03-05T07:42:43Z

The Achilles heel of gene-centric methods is domains shared with other, functionally-unrelated protein families. As these methods annotate in a vaccuum, they fall susceptible to classifying non-homologous sequences which would otherwise be correctly classified to a different sequence or protein family by methods that use large databases (e.g. EggNOG-mapper or methods that use Refseq). When these domains are present in a protein family, the risk of classifying false positives increases greatly.

After building a reference package (RefPkg), users should be provided with a simple method for annotating these domains in their RefPkg, using a database that is sufficiently large. Based off of these annotated domains, treesapp assign may be able to filter spurious homologous queries and users would be better informed of their RefPkg's protein structure.

I propose that through the layer subcommand, users can use a '--domains' flag that will automatically use the PFam database to perform HMM-to-HMM alignment, identify protein domains found in the RefPkg's profile HMM used for searching, and annotate those loci as such. The annotated domains can be preserved in a new 'domains' attribute of a reference package, and possibly propagated to future updates of the RefPkg.

TODO list:

Add 'domains' attribute to ReferencePackage class
Add '--domains' flag to treesapp layer arguments
Add HHsuite to requirements (conda, Docker and relevant Wiki pages)
Automate downloading of the latest HHsuite PFam databaseto the installation's data/ directory when necessary
Develop workflow for searching for and annotating domains in a RefPkg's profile HMM
Allow users and/or treesapp assign to filter queries mapped to domains

The text was updated successfully, but these errors were encountered:

cmorganl added the feature request A request for a new feature unlike one that already exists label Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotate domains in refpkg profile HMMs #77

Annotate domains in refpkg profile HMMs #77

cmorganl commented Mar 5, 2021

Annotate domains in refpkg profile HMMs #77

Annotate domains in refpkg profile HMMs #77

Comments

cmorganl commented Mar 5, 2021

TODO list: