Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users can't select SwissProt IDs when using UNIPROT key #12

Open
amanda-hi opened this issue Dec 20, 2023 · 0 comments
Open

Users can't select SwissProt IDs when using UNIPROT key #12

amanda-hi opened this issue Dec 20, 2023 · 0 comments
Assignees
Labels
gse Issue reported by a member of the SomaLogic GSE team

Comments

@amanda-hi
Copy link
Contributor

amanda-hi commented Dec 20, 2023

The SomaScan menu typically uses reviewed, manually curated UniProt IDs (aka "SwissProt" IDs) in its protein annotations. However, SomaScan.db currently returns all UniProt ID entries for a given protein, including non-reviewed, computationally annotated proteins (aka "TrEMBL" IDs). These different aspects of the UniProt knowledgebase are annotated on the UniProt website, but are not in SomaScan.db. Users are finding that they receive many more UniProt entries than expected in a given query, likely because they are also receiving TrEMBL IDs along with SwissProt IDs. Ideally, they should only be receiving SwissProt IDs, as those are the annotations presented in the SomaScan menu.

Example case where this is problematic: A SomaScan user discovered that if he mapped his seqIDs to UniProt using the SomaScandb package, 13,000 mappings were returned. This is because the UniProt database behind SomaScanDB does in fact contain multiple IDs for the same protein (TrEMBL IDs vs SwissProt IDs). The SBI/GSE reporter of this issue could foresee issues with assisting customers to ‘deduplicate’ this list and also reproduce the UniProt IDs in our menu.

Suggestion for solution: Is there metadata carried with UniProt database that would allow separation of the UniProt IDs to TrEMBL vs SwissProt? Keys that allow you to map ‘UniProt_all’, ‘UniProt_Trembl’ and ‘UniProt_SwissProt’ would be really helpful. Or at least, if not already in the git documentation, a line somewhere that alerts people to the multiple protein ID issue.

Timeline or deadline: A fix for this issue should be included in the 3.19 release of Bioconductor (in April/May 2024).

@amanda-hi amanda-hi self-assigned this Dec 20, 2023
@amanda-hi amanda-hi added the gse Issue reported by a member of the SomaLogic GSE team label Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gse Issue reported by a member of the SomaLogic GSE team
Projects
None yet
Development

No branches or pull requests

1 participant