Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the correctSystematicG = T parameter be used for a BAM file generated from STAR mapping of paired-end reads #111

Open
QiliShi opened this issue Jan 22, 2024 · 3 comments

Comments

@QiliShi
Copy link

QiliShi commented Jan 22, 2024

Hello,

I encountered an issue while using the correctSystematicG = T parameter with a BAM file generated from paired-end reads mapped with STAR.

R code:

bamfile <-CAGEexp(genomeName="BSgenome.Hsapiens.UCSC.hg38",inputFiles='SRR7092026Aligned.sortedByCoord.out.bam',inputFilesType="bamPairedEnd",sampleLabels='example')
ce <- getCTSS(bamfile)
Error message:


Reading in file: ~/Intergetic_transcripts/cage_seq/alignment/SRR7092026Aligned.sortedByCoord.out.bam...
        -> Filtering out low quality reads...
Loading required namespace: BSgenome.Hsapiens.UCSC.hg38
        -> Removing the first base of the reads if 'G' and not aligned to the genome...
        -> Estimating the frequency of adding a 'G' nucleotide and correcting the systematic bias...
Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in setnames(CTSS, c("chr", "pos", "strand", sample.label)): Can't assign 4 names to a 0 column data.table

It works fine using the correctSystematicG = F
ce <- getCTSS(bamfile,correctSystematicG = F)

Reading in file: ~/Intergetic_transcripts/cage_seq/alignment/SRR7092026Aligned.sortedByCoord.out.bam...
-> Filtering out low quality reads...
-> Removing the first base of the reads if 'G' and not aligned to the genome...

R4.3.2
attached packages:
[1] CAGEr_2.8.0 MultiAssayExperiment_1.28.0
[3] SummarizedExperiment_1.32.0 Biobase_2.62.0
[5] GenomicRanges_1.54.1 GenomeInfoDb_1.38.1
[7] IRanges_2.36.0 S4Vectors_0.40.2
[9] BiocGenerics_0.48.1 MatrixGenerics_1.14.0
[11] matrixStats_1.2.0

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.15.0
[3] magrittr_2.0.3 GenomicFeatures_1.54.1
[5] rmarkdown_2.25 BiocIO_1.12.0
[7] zlibbioc_1.48.0 vctrs_0.6.5
[9] memoise_2.0.1 Rsamtools_2.18.0
[11] DelayedMatrixStats_1.24.0 RCurl_1.98-1.14
[13] base64enc_0.1-3 BiocBaseUtils_1.4.0
[15] htmltools_0.5.7 S4Arrays_1.2.0
[17] progress_1.2.3 curl_5.1.0
[19] SparseArray_1.2.2 Formula_1.2-5
[21] KernSmooth_2.23-22 htmlwidgets_1.6.4
[23] plyr_1.8.9 Gviz_1.46.1
[25] cachem_1.0.8 GenomicAlignments_1.38.0
[27] lifecycle_1.0.4 pkgconfig_2.0.3
[29] Matrix_1.6-5 R6_2.5.1
[31] fastmap_1.1.1 GenomeInfoDbData_1.2.11
[33] digest_0.6.34 colorspace_2.1-0
[35] AnnotationDbi_1.64.1 Hmisc_5.1-1
[37] RSQLite_2.3.4 vegan_2.6-4
[39] filelock_1.0.3 fansi_1.0.6
[41] mgcv_1.9-1 httr_1.4.7
[43] abind_1.4-5 compiler_4.3.2
[45] bit64_4.0.5 htmlTable_2.4.2
[47] backports_1.4.1 CAGEfightR_1.22.0
[49] BiocParallel_1.36.0 DBI_1.2.1
[51] biomaRt_2.58.0 MASS_7.3-60
[53] rappdirs_0.3.3 DelayedArray_0.28.0
[55] rjson_0.2.21 permute_0.9-7
[57] gtools_3.9.5 tools_4.3.2
[59] foreign_0.8-86 nnet_7.3-19
[61] glue_1.7.0 restfulr_0.0.15
[63] nlme_3.1-164 stringdist_0.9.10
[65] grid_4.3.2 checkmate_2.3.0
[67] reshape2_1.4.4 cluster_2.1.6
[69] generics_0.1.3 operator.tools_1.6.3
[71] gtable_0.3.4 BSgenome_1.70.1
[73] formula.tools_1.7.1 ensembldb_2.26.0
[75] data.table_1.14.10 hms_1.1.3
[77] xml2_1.3.6 utf8_1.2.4
[79] XVector_0.42.0 pillar_1.9.0
[81] stringr_1.5.1 splines_4.3.2
[83] dplyr_1.1.4 BiocFileCache_2.10.1
[85] lattice_0.22-5 rtracklayer_1.62.0
[87] bit_4.0.5 deldir_2.0-2
[89] biovizBase_1.50.0 BSgenome.Hsapiens.UCSC.hg38_1.4.5
[91] tidyselect_1.2.0 Biostrings_2.70.1
[93] knitr_1.45 gridExtra_2.3
[95] ProtGenerics_1.34.0 xfun_0.41
[97] stringi_1.8.3 VGAM_1.1-9
[99] lazyeval_0.2.2 yaml_2.3.8
[101] som_0.3-5.1 evaluate_0.23
[103] codetools_0.2-19 interp_1.1-5
[105] tibble_3.2.1 cli_3.6.2
[107] rpart_4.1.23 munsell_0.5.0
[109] dichromat_2.0-0.1 Rcpp_1.0.12
[111] dbplyr_2.4.0 png_0.1-8
[113] XML_3.99-0.16 parallel_4.3.2
[115] ggplot2_3.4.4 assertthat_0.2.1
[117] blob_1.2.4 prettyunits_1.2.0
[119] latticeExtra_0.6-30 jpeg_0.1-10
[121] AnnotationFilter_1.26.0 sparseMatrixStats_1.14.0
[123] bitops_1.0-7 VariantAnnotation_1.48.1
[125] scales_1.3.0 crayon_1.5.2
[127] rlang_1.1.3 KEGGREST_1.42.0

@charles-plessy
Copy link
Owner

Hello, can you share your BAM file with me?

@QiliShi
Copy link
Author

QiliShi commented Jan 23, 2024

Thank you for your prompt reply. I've just sent the BAM file as an attachment by email.

@QiliShi
Copy link
Author

QiliShi commented Jan 25, 2024

Hello, it is plausible that the absence of sequencing quality information in the downloaded FASTQ file may indeed be the root cause of the observed issue. Unfortunately, I am unable to verify this assertion as I do not currently have access to my own cage-seq data for comparison or validation purposes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants