Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensusClustersGR() row.names problem #112

Open
MorexV3CAGE opened this issue Feb 8, 2024 · 4 comments
Open

consensusClustersGR() row.names problem #112

MorexV3CAGE opened this issue Feb 8, 2024 · 4 comments

Comments

@MorexV3CAGE
Copy link

MorexV3CAGE commented Feb 8, 2024

Hello,
I have a problem that didn't occur before while using only 3 data stages. But now that I added a 4th data set there is a problem.
First I did:

ce_combined <- mergeCAGEsets(ceNet, ceCAGE) #merge 2 previously generated CAGEexp objects (one with 3 stages, the other with a single stage, each has 2 replicas)
#followed the standard process again using clusterCTSS -> cumulativeCTSSdistribution -> quantilePositions

After that, I created a consensus ranges:

ce_combined <- aggregateTagClusters(ce_combined, tpmThreshold = 0.1, qLow = 0.1, qUp = 0.9, maxDist = 100)
seqarchr_consensus_gr <- consensusClustersGR(ce_combined, returnInterquantileWidth = TRUE,  qLow = 0.1, qUp = 0.9)

This worked no problem but when I want to annotate this set of consensus clusters using ChIPseeker or even convert it to a data frame I get this error:

> dds_anno <- annotatePeak(seqarchr_consensus_gr, tssRegion=c(-500, 100),
                          TxDb=txdb)
>> preparing features information...		 2024-02-08 09:21:11 AM 
>> identifying nearest features...		 2024-02-08 09:21:11 AM 
>> calculating distance from peak to TSS...	 2024-02-08 09:21:12 AM 
>> assigning genomic annotation...		 2024-02-08 09:21:12 AM 
>> assigning chromosome lengths			 2024-02-08 09:21:14 AM 
>> done...					 2024-02-08 09:21:14 AM 
Error in data.frame(seqnames = as.factor(seqnames(x)), start = start(x),  : 
  duplicate row.names: N.4DAG.1, D4.A, N.4DAG.2, D8.Others, D8.No5, D4.B, D24.B, D24.A

It stops roughly after half of the data is annotated (39k/62k).

I have looked into it and apparently, the row.names that it has a problem with are in the elementMetadata. But even though I tried setting specific row.names for the metadata it still gives me the same error and I don't see N.4DAG.1, D4.A, N.4DAG.2, D8.Others, D8.No5, D4.B, D24.B, D24.A in any place in the structure. (the names are the BAM file names and I did not merge replicates since I use the data for deseq analysis)

For now, I have fixed it by deleting the metadata and annotating it without it, since I don't need the metadata for the current task but I would like to know if this is a CAGEr error or possibly some mistake on my part.

Thank you.

Here is my session info:

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Prague
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] patchwork_1.2.0                  assertthat_0.2.1                
 [3] wiggleplotr_1.24.0               GenomicAlignments_1.36.0        
 [5] Rsamtools_2.16.0                 devtools_2.4.5                  
 [7] usethis_2.2.2                    SparseArray_1.0.12              
 [9] S4Arrays_1.0.6                   abind_1.4-5                     
[11] Matrix_1.6-1                     reshape2_1.4.4                  
[13] clusterProfiler_4.8.3            GOfuncR_1.20.0                  
[15] vioplot_0.4.0                    zoo_1.8-12                      
[17] sm_2.2-5.7.1                     rsvg_2.6.0                      
[19] ggimage_0.3.3                    lubridate_1.9.3                 
[21] forcats_1.0.0                    stringr_1.5.1                   
[23] purrr_1.0.2                      readr_2.1.5                     
[25] tidyr_1.3.1                      tibble_3.2.1                    
[27] tidyverse_2.0.0                  cvms_1.6.0                      
[29] readxl_1.4.3                     cowplot_1.1.3                   
[31] HiCcompare_1.22.1                edgeR_3.42.4                    
[33] limma_3.56.2                     rmspc_1.6.0                     
[35] DiffBind_3.10.1                  BiocManager_1.30.22             
[37] csaw_1.34.0                      plotly_4.10.4                   
[39] ChIPseeker_1.36.0                Glimma_2.10.0                   
[41] DESeq2_1.40.2                    BSgenome.MorexV3.Gatersleben_3.0
[43] ggseqlogo_0.1                    CAGEfightR_1.20.0               
[45] ggforce_0.4.1                    dplyr_1.1.3                     
[47] ggplot2_3.4.4                    BSgenome_1.68.0                 
[49] rtracklayer_1.60.1               Biostrings_2.68.1               
[51] XVector_0.40.0                   GenomicFeatures_1.52.2          
[53] AnnotationDbi_1.62.2             CAGEr_2.6.1                     
[55] MultiAssayExperiment_1.26.0      SummarizedExperiment_1.30.2     
[57] Biobase_2.60.0                   GenomicRanges_1.52.1            
[59] GenomeInfoDb_1.36.4              IRanges_2.34.1                  
[61] S4Vectors_0.38.2                 BiocGenerics_0.46.0             
[63] MatrixGenerics_1.12.3            matrixStats_1.0.0               
[65] gridExtra_2.3                    reshape_0.8.9                   

loaded via a namespace (and not attached):
  [1] dichromat_2.0-0.1                       progress_1.2.3                         
  [3] urlchecker_1.0.1                        nnet_7.3-19                            
  [5] vctrs_0.6.3                             digest_0.6.33                          
  [7] png_0.1-8                               ggrepel_0.9.5                          
  [9] mixsqp_0.3-54                           deldir_1.0-9                           
 [11] permute_0.9-7                           magick_2.8.2                           
 [13] MASS_7.3-60.0.1                         SQUAREM_2021.1                         
 [15] foreach_1.5.2                           httpuv_1.6.11                          
 [17] qvalue_2.32.0                           withr_3.0.0                            
 [19] xfun_0.40                               amap_0.8-19                            
 [21] ggfun_0.1.4                             ellipsis_0.3.2                         
 [23] memoise_2.0.1                           gson_0.1.0                             
 [25] profvis_0.3.8                           tidytree_0.4.6                         
 [27] gtools_3.9.5                            Formula_1.2-5                          
 [29] prettyunits_1.2.0                       promises_1.2.1                         
 [31] KEGGREST_1.40.1                         httr_1.4.7                             
 [33] downloader_0.4                          GreyListChIP_1.32.1                    
 [35] restfulr_0.0.15                         rhdf5filters_1.12.1                    
 [37] ashr_2.2-63                             ps_1.7.6                               
 [39] rhdf5_2.44.0                            rstudioapi_0.15.0                      
 [41] miniUI_0.1.1.1                          generics_0.1.3                         
 [43] DOSE_3.26.2                             base64enc_0.1-3                        
 [45] processx_3.8.3                          curl_5.0.2                             
 [47] zlibbioc_1.46.0                         ggraph_2.1.0                           
 [49] polyclip_1.10-6                         GenomeInfoDbData_1.2.10                
 [51] doParallel_1.0.17                       xtable_1.8-4                           
 [53] evaluate_0.23                           systemPipeR_2.6.3                      
 [55] BiocFileCache_2.8.0                     hms_1.1.3                              
 [57] irlba_2.3.5.1                           colorspace_2.1-0                       
 [59] filelock_1.0.3                          magrittr_2.0.3                         
 [61] later_1.3.2                             viridis_0.6.5                          
 [63] ggtree_3.8.2                            lattice_0.22-5                         
 [65] XML_3.99-0.16.1                         shadowtext_0.1.3                       
 [67] Hmisc_5.1-1                             pillar_1.9.0                           
 [69] nlme_3.1-164                            iterators_1.0.14                       
 [71] caTools_1.18.2                          compiler_4.3.1                         
 [73] mapplots_1.5.2                          stringi_1.8.3                          
 [75] plyr_1.8.9                              crayon_1.5.2                           
 [77] BiocIO_1.10.0                           truncnorm_1.0-9                        
 [79] gridGraphics_0.5-1                      emdbook_1.3.13                         
 [81] locfit_1.5-9.8                          graphlayouts_1.1.0                     
 [83] bit_4.0.5                               fastmatch_1.1-4                        
 [85] codetools_0.2-19                        TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [87] biovizBase_1.48.0                       mime_0.12                              
 [89] som_0.3-5.1                             splines_4.3.1                          
 [91] Rcpp_1.0.11                             dbplyr_2.4.0                           
 [93] sparseMatrixStats_1.12.2                HDO.db_0.99.1                          
 [95] cellranger_1.1.0                        interp_1.1-6                           
 [97] knitr_1.45                              blob_1.2.4                             
 [99] utf8_1.2.4                              AnnotationFilter_1.24.0                
[101] apeglm_1.22.1                           fs_1.6.3                               
[103] checkmate_2.3.1                         DelayedMatrixStats_1.22.6              
[105] pkgbuild_1.4.3                          Gviz_1.44.2                            
[107] ggplotify_0.1.2                         callr_3.7.3                            
[109] tzdb_0.4.0                              tweenr_2.0.2                           
[111] pkgconfig_2.0.3                         pheatmap_1.0.12                        
[113] tools_4.3.1                             cachem_1.0.8                           
[115] RSQLite_2.3.5                           viridisLite_0.4.2                      
[117] DBI_1.2.1                               numDeriv_2016.8-1.1                    
[119] fastmap_1.1.1                           rmarkdown_2.25                         
[121] scales_1.3.0                            grid_4.3.1                             
[123] coda_0.19-4.1                           VariantAnnotation_1.46.0               
[125] rpart_4.1.23                            farver_2.1.1                           
[127] tidygraph_1.3.1                         scatterpie_0.2.1                       
[129] mgcv_1.9-1                              yaml_2.3.8                             
[131] VGAM_1.1-9                              latticeExtra_0.6-30                    
[133] foreign_0.8-86                          cli_3.6.2                              
[135] lifecycle_1.0.4                         mvtnorm_1.2-4                          
[137] sessioninfo_1.2.2                       backports_1.4.1                        
[139] BiocParallel_1.34.2                     timechange_0.3.0                       
[141] gtable_0.3.4                            rjson_0.2.21                           
[143] parallel_4.3.1                          ape_5.7-1                              
[145] jsonlite_1.8.7                          bitops_1.0-7                           
[147] bit64_4.0.5                             yulab.utils_0.1.4                      
[149] vegan_2.6-4                             bdsmatrix_1.3-6                        
[151] metapod_1.8.0                           GOSemSim_2.26.1                        
[153] lazyeval_0.2.2                          shiny_1.8.0                            
[155] htmltools_0.5.6                         enrichplot_1.20.3                      
[157] GO.db_3.17.0                            rappdirs_0.3.3                         
[159] ensembldb_2.24.1                        glue_1.7.0                             
[161] RCurl_1.98-1.14                         InteractionSet_1.28.1                  
[163] treeio_1.24.3                           jpeg_0.1-10                            
[165] boot_1.3-28.1                           igraph_1.5.1                           
[167] invgamma_1.1                            R6_2.5.1                               
[169] gplots_3.1.3.1                          cluster_2.1.6                          
[171] bbmle_1.0.25.1                          pkgload_1.3.4                          
[173] Rhdf5lib_1.22.1                         stringdist_0.9.12                      
[175] aplot_0.2.2                             DelayedArray_0.26.7                    
[177] tidyselect_1.2.0                        plotrix_3.8-4                          
[179] ProtGenerics_1.32.0                     htmlTable_2.4.2                        
[181] operator.tools_1.6.3                    xml2_1.3.5                             
[183] munsell_0.5.0                           KernSmooth_2.23-22                     
[185] data.table_1.15.0                       htmlwidgets_1.6.4                      
[187] fgsea_1.26.0                            RColorBrewer_1.1-3                     
[189] hwriter_1.3.2.1                         biomaRt_2.56.1                         
[191] rlang_1.1.3                             remotes_2.4.2.1                        
[193] ShortRead_1.58.0                        formula.tools_1.7.1                    
[195] fansi_1.0.6
@charles-plessy
Copy link
Owner

ce_combined <- mergeCAGEsets(ceNet, ceCAGE)

Hello, I do not understand how you manage to run the mergeCAGEsets function with only two arguments. It expects three of them, precisely to ensure that the sample names are set correctly with no duplicates. Can you re-try with setting all three arguments?

Best,

Charles

@MorexV3CAGE
Copy link
Author

Hi,
Thanks for the response, although, I don't know what 3 arguments do you mean exactly, am I working with the wrong function or manual?
https://www.rdocumentation.org/packages/CAGEr/versions/1.14.0/topics/mergeCAGEsets

Can you please specify what the 3rd argument should be please? Or maybe if there is a different function for this that I am missing?
Best,
Simon

@charles-plessy
Copy link
Owner

I am sorry, I got confused with mergeSamples. I just tried mergeCAGEsets(exampleCAGEexp[,1:2],exampleCAGEexp[,3:4]) as a trivial example and it worked fine. Can you check the colData of your objects before and after merging to see if something looks strange?

@MorexV3CAGE
Copy link
Author

Sorry for later response, I fell ill.
I don't see a difference compared to standard set, except this part:
image

The OutOfClusters is not present in any other CAGErset that I use. Might it be due to that?

But I still don't know how it is possible to get this error in the consensus object.
The error comes when working with the consensus set later. Here screenshot of the consensus object and the error:
image

Problem is, row.names look like this:
image

And I don't see any part of the object where it would state N.4DAG.1,...
But when I delete mcols just by: mcols(seqarchr_consensus_gr) <- NULL
The error is gone...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants