Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compareCluster GSEA + universal annotation = does not work #705

Open
philousap opened this issue Jul 11, 2024 · 1 comment
Open

compareCluster GSEA + universal annotation = does not work #705

philousap opened this issue Jul 11, 2024 · 1 comment

Comments

@philousap
Copy link

Hi,
I am working with compareCluster, trying to do a GSEA analysis using homemade annotation. To do so I decide to use Universal enrichment analysis.

I have a TERM2GENE and TERM2NAME. I am trying to run the same a GSEA (with all my genes - not only differential expressed ones). I have an error:
Error in check_gene_id(geneList, geneSets) :
--> No gene can be mapped....

Here my code (let's say input1, input2 and input3 are subsets of input):

geneList1=input1$logFC names(geneList1)=input1$Symbol geneList1=sort(geneList1, decreasing = TRUE)

geneList2=input2$logFC names(geneList2)=input2$Symbol geneList2=sort(geneList2, decreasing = TRUE)

geneList3=input3$logFC names(geneList3)=input3$Symbol geneList3=sort(geneList3, decreasing = TRUE)

geneList=c("Gut"=geneList1, "Head"=geneList2, "Abdomen"=geneList3)

gsea=compareCluster(geneList, data=input, TERM2GENE=TERM2GENE, TERM2NAME =TERM2NAME, fun="GSEA" minGSSize = 10, nPermSimple = 100000, eps=0, pvalueCutoff = 0.1, pAdjustMethod = "BH" )

This is my error:
Error in check_gene_id(geneList, geneSets) :
--> No gene can be mapped....

However, when I run the same geneLists one by one (geneList1, then geneList2 and so one) using the following code, it works perfectly (see my pictures at the end)

GSEA(geneList1, TERM2GENE=TERM2GENE, TERM2NAME =TERM2NAME, minGSSize = 10, nPermSimple = 100000, eps=0, pvalueCutoff = 0.1, pAdjustMethod = "BH" )

Some one could help me here please?
Thank you
Gut
Abdomen
Head

@guidohooiveld
Copy link

Everything points to something not being OK with your aggregated input geneList. Please double-check it!

Please note that performing GSEA with user-defined gene sets through the compareCluster function works as expected. See my code below.
Also note that nPermSimple is a legacy argument, and should not be used anymore! Only use the argument eps.

> ## load libraries
> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
> 
> ## load sample data
> data(geneList, package="DOSE")
> head(geneList)
    4312     8318    10874    55143    55388      991 
4.572613 4.514594 4.418218 4.144075 3.876258 3.677857 
> 
> ## using sample data, create list with 3 comparisons to be used as input for comparCluster
> ## note that 'Head' is the reverse of 'Gut' and 'Abdomen'.
> inputList <- list(Gut = geneList, Abdomen = geneList, Head = sort(-1*geneList, decreasing = TRUE) )
> 
> str(inputList)
List of 3
 $ Gut    : Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
  ..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
 $ Abdomen: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
  ..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
 $ Head   : Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ...
  ..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...
> 
> 
> ## create TERM2GENE and TERM2NAME for KEGG
> ## note that when using the fuction gseKEGG this step is done automagically
> kegg.data <- download_KEGG(species="hsa", keggType = "KEGG", keyType = "kegg")
> term2gene.kegg <- kegg.data$KEGGPATHID2EXTID
> term2name.kegg <- kegg.data$KEGGPATHID2NAME
> 
> head(term2gene.kegg)
      from    to
1 hsa00010 10327
2 hsa00010   124
3 hsa00010   125
4 hsa00010   126
5 hsa00010   127
6 hsa00010   128
> head(term2name.kegg)
      from                              to
1 hsa01100              Metabolic pathways
2 hsa01200               Carbon metabolism
3 hsa01210 2-Oxocarboxylic acid metabolism
4 hsa01212           Fatty acid metabolism
5 hsa01230     Biosynthesis of amino acids
6 hsa01232           Nucleotide metabolism
> 
> 
> ## run compareCluster with generic function GSEA, and user-provied TERM2GENE and TERM2NAME
> res <- compareCluster(geneCluster = inputList,
+                       fun = "GSEA",
+                       eps = 0,
+                       pvalueCutoff = 0.05,
+                       pAdjustMethod = "BH",
+                       TERM2GENE = term2gene.kegg,
+                       TERM2NAME = term2name.kegg)
> 
> ## convert entrezids into symbols
> res <- setReadable(res, OrgDb = org.Hs.eg.db, keyType="ENTREZID")
>  
> ## check
> res
#
# Result of Comparing 3 gene clusters 
#
#.. @fun         GSEA 
#.. @geneClusters       List of 3
 $ Gut    : Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
  ..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
 $ Abdomen: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
  ..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
 $ Head   : Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ...
  ..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...
#...Result      'data.frame':   172 obs. of  12 variables:
 $ Cluster        : Factor w/ 3 levels "Gut","Abdomen",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ID             : chr  "hsa04110" "hsa03050" "hsa04657" "hsa05169" ...
 $ Description    : chr  "Cell cycle" "Proteasome" "IL-17 signaling pathway" "Epstein-Barr virus infection" ...
 $ setSize        : int  139 43 85 193 33 62 86 130 55 80 ...
 $ enrichmentScore: num  0.664 0.709 0.562 0.434 0.723 ...
 $ NES            : num  2.83 2.41 2.2 1.96 2.31 ...
 $ pvalue         : num  9.16e-21 2.97e-08 8.38e-08 1.52e-07 4.07e-07 ...
 $ p.adjust       : num  3.13e-18 5.08e-06 9.56e-06 1.30e-05 2.78e-05 ...
 $ qvalue         : num  2.30e-18 3.72e-06 7.00e-06 9.51e-06 2.04e-05 ...
 $ rank           : num  1155 2516 2880 2820 1905 ...
 $ leading_edge   : chr  "tags=36%, list=9%, signal=33%" "tags=65%, list=20%, signal=52%" "tags=49%, list=23%, signal=38%" "tags=39%, list=23%, signal=31%" ...
 $ core_enrichment: chr  "CDC45/CDC20/CCNB2/NDC80/CCNA2/CDK1/MAD2L1/CDT1/TTK/AURKB/CHEK1/TRIP13/CCNB1/MCM5/PTTG1/MCM2/CDC25A/CDC6/PLK1/BU"| __truncated__ "PSMA7/PSMD3/PSMB9/PSMB5/IFNG/PSMD7/ADRM1/PSME2/PSMB3/PSMA4/PSMB2/PSMA3/PSMA5/PSMB7/PSMD14/PSME4/SEM1/PSMB10/PSM"| __truncated__ "MMP1/S100A9/S100A8/S100A7/CXCL10/CXCL3/CCL20/FOSL1/MMP9/CXCL8/LCN2/CCL2/MUC5B/CEBPB/CCL7/IFNG/CCL17/CXCL5/CXCL1"| __truncated__ "CXCL10/CCNA2/TAP1/ISG15/CCNE1/CCNE2/SKP2/STAT1/HLA-DRB4/HLA-DOB/MYC/CD3G/PSMD3/E2F1/IRAK1/CD247/CD3D/LYN/OAS1/R"| __truncated__ ...
#.. number of enriched terms found for each gene cluster:
#..   Gut: 56 
#..   Abdomen: 58 
#..   Head: 58 
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, 
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. 
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. 
The Innovation. 2021, 2(3):100141 

> 
> ## prepare dotplot
> p <- dotplot(res, font.size=8, showCategory=8, title =("GSEA results"), split=".sign") + facet_grid(.~.sign)
> print(p)
> 
> 

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants