Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does dorado handle 3D data #1089

Open
happier21 opened this issue Oct 18, 2024 · 4 comments
Open

How does dorado handle 3D data #1089

happier21 opened this issue Oct 18, 2024 · 4 comments
Labels
question Issue is a question

Comments

@happier21
Copy link

In our experiment, we cross-linked chromosome fragments that may interact in three-dimensional space, basecalling with dorado and identifying methylation information. I need to split the cross-linked chromosome fragments and then compare them back to the reference genome, but how do I get the methylation information to the split fragments when I split them? Or can dorado do it together with identifying methylation information
Waiting for your reply
Thank you

@HalfPhoton
Copy link
Collaborator

This question isn't a dorado issue and is probably best asked on the Nanopore community forum.

However, if I understand your question correctly:
Dorado annotates mod-basecalled reads with the ML and MM tags but doesn't provide a way to split the reads into fragments.

You'll need to use other tools for post-processing such as samtools which should preserve these tags when splitting.

Kind regards,
Rich

@HalfPhoton HalfPhoton added the question Issue is a question label Oct 21, 2024
@biozzq
Copy link

biozzq commented Oct 22, 2024

Dear @HalfPhoton

I think I have encountered a similar issue and would like to ask you for help.

The use of long-read sequencing technologies in the context of three-dimensional genomics, protein-DNA interactions, and simultaneous DNA methylation profiling has become achievable.

As you are aware, incorporating three-dimensional genomic information can lead to the presence of a significant number of chimeric reads in the sequencing results. These chimeras can typically be effectively aligned to the reference genome through split mapping, where the split positions correspond to the digest sites or added linker sequences. However, due to the inherent accuracy issues associated with ONT base calling, splitting at the read level may not always be the optimal approach. Instead, splitting based on the alignment results relative to the reference genome might yield better outcomes.

Given above context, I would like to ask whether BAM files generated from the base calling and alignment process using dorado can be directly used for quantitative methylation analysis with modkit.

Thank you for your time, and I look forward to your insights on this matter.

Best regards,

Zheng zhuqing

@HalfPhoton
Copy link
Collaborator

Hi @biozzq,

I would like to ask whether BAM files generated from the base calling and alignment process using dorado can be directly used for quantitative methylation analysis with modkit.

Yes - the output from dorado can be used for methylation analysis in modkit.

However, due to the inherent accuracy issues associated with ONT base calling, splitting at the read level may not always be the optimal approach. Instead, splitting based on the alignment results relative to the reference genome might yield better outcomes.

We're always working on improving basecalling accuracy and performance in Dorado to meet the needs of our users. I'll raise this use case with the team to discuss how we can better support these interesting workflows especially with regards to read splitting if it's problematic in three-dimensional genomics.

Best regards,
Rich

@biozzq
Copy link

biozzq commented Oct 23, 2024

Dear @HalfPhoton

Thank you for your prompt response. I have a few more uncertainties that I would like to consult with you about. I obtained the modBAM file through the dorado process using the following command: dorado-0.7.3-linux-x64/bin/dorado basecaller [email protected] ./pod5_pass --modified-bases 6mA 5mC_5hmC --reference Hg38.fa | samtools view -bhS > output.bam. Here, I have attached a portion of the alignments in the file subset.bam.zip. Upon examining the methylation information within it, I found that the supplementary alignment records do not have MM and ML tags. For example, the alignment results for the read named "c59f6589-a8c9-4091-9e8e-3afca66085b5". Can modkit accurately assess the methylation information for those supplementary alignment target regions?
subset.bam.zip

Thank you for your assistance.
Best regards,
Zheng zhuqing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issue is a question
Projects
None yet
Development

No branches or pull requests

3 participants