Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitochondrial Genes and Annotations missing from EnrichGO #733

Open
Dragonmasterx87 opened this issue Oct 21, 2024 · 1 comment
Open

Mitochondrial Genes and Annotations missing from EnrichGO #733

Dragonmasterx87 opened this issue Oct 21, 2024 · 1 comment

Comments

@Dragonmasterx87
Copy link

Dragonmasterx87 commented Oct 21, 2024

Hi Great tool thanks for maintaining it.

i have a question that relates to the lack of mitochondrial genome encoded genes and their corresponding ontologies.

In the following code:

hgnc <- c("MT-ATP6", "MT-ATP8", "MT-CO1", "MT-CO2", "MT-CO3", "MT-CYB", "MT-ND1", "MT-ND2", "MT-ND3", "MT-ND4", "MT-ND4L", "MT-ND5", "MT-ND6", "MT-RNR1", "MT-RNR2")
  
  # Run GO enrichment analysis genes up
  GO.up <- enrichGO(gene = hgnc , 
                    universe = all_genes, 
                    keyType = "SYMBOL", #keytypes(org.Hs.eg.db)
                    OrgDb = org.Hs.eg.db, 
                    ont = c("ALL"), 
                    pAdjustMethod = "BH", 
                    pvalueCutoff = 1, 
                    qvalueCutoff = 1, #if not set default is at 0.05
                    readable = TRUE)

I get a NULL, which is impossible because all these terms are part of multiple mitochondrial/ETC component annotations. I noticed that this could be a symbol issue, but upon using shorthand symbols (which is incorrect they should be HNGC) like ATP6, ATP8 etc, I found that the answer was still NULL.

However this could be a GO 2023 problem. If so, can one utilize older GO like from 2019 or 2018? If so how?

Thanks a lot.!

🐉

@guidohooiveld
Copy link

guidohooiveld commented Oct 22, 2024

I believe the issue is that the hgnc symbols are not recognized c.q. compatible with the org.Hs.eg.db annotation, which is NCBI-based.

If the corresponding ENTREZID are used, GO terms are returned...

> library(org.Hs.eg.db)
> 
> hgnc <- c("MT-ATP6", "MT-ATP8", "MT-CO1", "MT-CO2", "MT-CO3", "MT-CYB", "MT-ND1", "MT-ND2",
+           "MT-ND3", "MT-ND4", "MT-ND4L", "MT-ND5", "MT-ND6", "MT-RNR1", "MT-RNR2")
> 
> ## nothing is found
> AnnotationDbi:::select(org.Hs.eg.db, keys = hgnc, keytype = "SYMBOL",
+                columns = c("ENTREZID", "SYMBOL", "GENENAME") )
Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'SYMBOL'. Please use the keys method to see a listing of valid arguments.
> 
> ## try ALIAS
> ## almost nothing is found
> AnnotationDbi:::select(org.Hs.eg.db, keys = hgnc, keytype = "ALIAS",
+                columns = c("ENTREZID", "SYMBOL", "GENENAME") )
'select()' returned 1:1 mapping between keys and columns
     ALIAS  ENTREZID   SYMBOL             GENENAME
1  MT-ATP6      <NA>     <NA>                 <NA>
2  MT-ATP8      <NA>     <NA>                 <NA>
3   MT-CO1      <NA>     <NA>                 <NA>
4   MT-CO2 107075310 MTCO2P12 MT-CO2 pseudogene 12
5   MT-CO3      <NA>     <NA>                 <NA>
6   MT-CYB      <NA>     <NA>                 <NA>
7   MT-ND1      <NA>     <NA>                 <NA>
8   MT-ND2      <NA>     <NA>                 <NA>
9   MT-ND3      <NA>     <NA>                 <NA>
10  MT-ND4      <NA>     <NA>                 <NA>
11 MT-ND4L      <NA>     <NA>                 <NA>
12  MT-ND5      <NA>     <NA>                 <NA>
13  MT-ND6      <NA>     <NA>                 <NA>
14 MT-RNR1      <NA>     <NA>                 <NA>
15 MT-RNR2      <NA>     <NA>                 <NA>
> 
> 
> ## manually look up your first 2 entries:
> ## MT-ATP6 has ENTREZID 4508 (https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:7414)
> ## MT-ATP8 has ENTREZID 4509 (https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:7415)
> 
> 
> AnnotationDbi:::select(org.Hs.eg.db, keys = c("4508","4509"), keytype = "ENTREZID",
+                columns = c("ENTREZID", "SYMBOL", "ALIAS", "GENENAME", "GOALL") ) [c(1:4, 1808:1812) ,]
'select()' returned 1:many mapping between keys and columns
     ENTREZID SYMBOL   ALIAS                  GENENAME      GOALL EVIDENCEALL
1        4508   ATP6 ATPase6 ATP synthase F0 subunit 6 GO:0003674         IBA
2        4508   ATP6 ATPase6 ATP synthase F0 subunit 6 GO:0003674         IDA
3        4508   ATP6 ATPase6 ATP synthase F0 subunit 6 GO:0003674         IPI
4        4508   ATP6 ATPase6 ATP synthase F0 subunit 6 GO:0003824         IBA
1808     4509   ATP8    ATP8 ATP synthase F0 subunit 8 GO:1902600         IEA
1809     4509   ATP8    ATP8 ATP synthase F0 subunit 8 GO:1904949         IBA
1810     4509   ATP8    ATP8 ATP synthase F0 subunit 8 GO:1904949         IDA
1811     4509   ATP8    ATP8 ATP synthase F0 subunit 8 GO:1904949         IEA
1812     4509   ATP8    ATP8 ATP synthase F0 subunit 8 GO:1904949         NAS
     ONTOLOGYALL
1             MF
2             MF
3             MF
4             MF
1808          BP
1809          CC
1810          CC
1811          CC
1812          CC
> 
> 
> packageVersion("org.Hs.eg.db")
[1] ‘3.19.1’
> 

LATER ADDED:
Yet, when you check the NCBI gene page for the first entry (4508) (here) the official symbol is MT-ATP6!

Mmm...?? Maybe something to report on the Bioconductor support forum?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants