Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregateTagClusters give a different result on CAGEexp compared with formerly on CAGEset objects. #86

Open
charles-plessy opened this issue Jun 29, 2023 · 1 comment

Comments

@charles-plessy
Copy link
Owner

charles-plessy commented Jun 29, 2023

This is quite disturbing… Compare the two:

This was also mentioned in #40

@charles-plessy
Copy link
Owner Author

Using a fixed version of aggregateTagClusters (8d746fa). I can reproduce the consensus cluster coordinates computed in CAGEr 1.20 (Bioc release 3.6), if I turn off the quantile options.

> aggregateTagClusters(ce, tpmThreshold = 5, qLow = NULL, qUp = NULL, maxDist = 100) |> consensusClustersGR() |> sort()
ConsensusClusters object with 284 ranges and 3 metadata columns:
                            seqnames            ranges strand |     score    dominant_ctss tpm.dominant_ctss
                               <Rle>         <IRanges>  <Rle> | <numeric>        <GRanges>         <numeric>
  chr17:26453632-26453737:+    chr17 26453632-26453737      + |   129.734 chr17:26453701:+           37.9264
  chr17:26564508-26564611:+    chr17 26564508-26564611      + |   231.065 chr17:26564585:+           65.7220
  chr17:26595637-26595805:+    chr17 26595637-26595805      + |   905.419 chr17:26595750:+          368.6590
  chr17:26596033-26596339:+    chr17 26596033-26596339      + |   141.826 chr17:26596198:+           55.5861
  chr17:26645155-26645516:+    chr17 26645155-26645516      + |  2608.956 chr17:26645160:+          317.3201
                        ...      ...               ...    ... .       ...              ...               ...
  chr17:45534699-45534730:-    chr17 45534699-45534730      - | 182.18189 chr17:45534727:-         146.95219
  chr17:45545895-45546033:-    chr17 45545895-45546033      - | 933.76230 chr17:45545991:-         277.01814
  chr17:45554296-45554345:-    chr17 45554296-45554345      - |  37.79642 chr17:45554339:-          17.51376
  chr17:45700092-45700187:-    chr17 45700092-45700187      - |   9.21467 chr17:45700182:-           9.21467
  chr17:45901695-45901784:-    chr17 45901695-45901784      - | 789.81397 chr17:45901749:-         283.80412
  -------
  seqinfo: 26 sequences (1 circular) from danRer7 genome

At the moment, I am inclined to think that CAGEr 1.20 did not take the qLow and qUp options into account for defining coordinates (in contrary to what the documentation states), and that this explains this part of the the discrepancy. Still, the scores do not match.

From the old CAGEr documentation:

consensusCl <- consensusClusters(myCAGEset)
> head(consensusCl)
consensus.cluster chr start end strand tpm
1 1 chr17 26453631 26453737 +  185.9678
2 2 chr17 26564507 26564611 +  318.3590
3 3 chr17 26595636 26595805 + 1017.8475
4 4 chr17 26596032 26596339 +  307.5127
5 5 chr17 26645154 26645516 + 2844.4560
6 6 chr17 26651962 26652050 +   71.0783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant