Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gseKEGG function running indefinitely with a gene list of over 3000 genes #709

Open
Sophia409 opened this issue Jul 21, 2024 · 1 comment

Comments

@Sophia409
Copy link

Hello,
I am encountering an issue while using the gseKEGG function from the clusterProfiler package for GSEA enrichment analysis. I have provided a gene list containing just over 3000 genes, but the function has been running for two hours without completing. I manually stopped the process and attempted to modify the function parameters, but after several tries, the function still hangs.
Here are the details of my setup:
clusterProfiler version: 4.12.0 (latest version)
I would appreciate any insights into why this might be happening and how I can resolve this issue.
Thank you for your help!
Best regards,

> genelist <- genelist[names(genelist) %in% entrez[,1]]
> names(genelist) <- entrez[match(names(genelist),entrez[,1]),2]
> genelist <- sort(genelist, decreasing = T) #按log2FC高低排序
> length(genelist)
[1] 3786
> head(genelist)
  20304   20306   20296   16175   14825  117167 
1112.53 1059.17 1018.66  651.99  608.66  603.55 
> #2)基于KEGG基因集的GSEA富集
> set.seed(123)
> KEGG_ges <- gseKEGG(
+   geneList = genelist,
+   organism = "mmu",
+   minGSSize = 10,
+   maxGSSize = 500,
+   pvalueCutoff = 0.05,
+   pAdjustMethod = "BH",
+   verbose = FALSE,
+   eps = 0)
Reading KEGG annotation online: "https://rest.kegg.jp/link/mmu/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/mmu"...
警告信息:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam,  :
  There are ties in the preranked stats (80.9% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.
2: In fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize,  :
  There were 1 pathways for which P-values were not calculated properly due to unbalanced (positive and negative) gene-level statistic values. For such pathways pval, padj, NES, log2err are set to NA. You can try to increase the value of the argument nPermSimple (for example set it nPermSimple = 10000)
> 
> KEGG_ges <- gseKEGG(
+   geneList = genelist,
+   organism = "mmu")
preparing geneSet collections...
GSEA analysis...
警告信息:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam,  :
  There are ties in the preranked stats (80.9% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.
2: In fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize,  :
  There were 1 pathways for which P-values were not calculated properly due to unbalanced (positive and negative) gene-level statistic values. For such pathways pval, padj, NES, log2err are set to NA. You can try to increase the value of the argument nPermSimple (for example set it nPermSimple = 10000)
> KEGG_ges <- gseKEGG(
+   geneList = genelist,
+   organism = "mmu",
+   nPermSimple = 10000)
preparing geneSet collections...
GSEA analysis...
@guidohooiveld
Copy link

Did you carefully read the messages that were returned?

This is the key remark:
There are ties in the preranked stats (80.9% of the list).

In other words, 81% of your input data has an identical ranking metric! Why? This cannot be correct...

Anyway, this results in behavior reported before:
ctlab/fgsea#151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants