Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epigenetic Analysis Pipeline Issue: "Failed to Get Modbase Info AUX Data Not Found #1080

Open
priyanagpal25 opened this issue Oct 13, 2024 · 2 comments
Labels
mods For issues related to modified base calling

Comments

@priyanagpal25
Copy link

Issue Report

Please describe the issue:

I am working on epigenetic analysis for bacterial samples using Oxford Nanopore sequencing (FLO-MIN114), and I’m transitioning from Tombo to Dorado for modified basecalling since Tombo is now deprecated. I’m encountering an issue with processing the output files, and I’m unsure if my pipeline is set up correctly.

Sequencing Setup:

Sequencing chemistry: FLO-MIN114
Raw file format: POD5
Basecalling: Dorado (sup, m6A)
Software: MinKNOW
Alignment reference genome: FASTA

Steps Taken:

Basecalling and Demultiplexing:
On the MinKNOW interface, I first performed basecalling using sup, m6A and demultiplexed the samples. This produced both .fastq and .bam files.

File Information:
    .bam files seem to contain modified base information.
    .fastq files do not have modified base information.

Alignment:
I used the .bam files for alignment in MinKNOW, with a reference genome in FASTA format. This produced multiple .bam files and corresponding .bam.bai index files.

Merging:
I merged all .bam files using samtools merge:

samtools merge merged_output.bam *.bam
Then i indexed the merged .bam file
Modkit Pileup:
I ran the following command to generate modified base information:
modkit pileup merged_output.bam > /modkitoutput/pileup.bed
However, I encountered the following error:
Failed to get modbase info AUX data not found
Issue:

It appears that the modified base information is not being recognized by Modkit during the pileup process. The error suggests missing auxiliary (AUX) data, which seems related to the modification calls.

Questions:

What is the correct pipeline for modified basecalling for bacterial samples using Dorado?
Is there a specific step I am missing to ensure that modified base information is included in the .bam file?
Should I adjust my approach to alignment or demultiplexing to resolve this issue?

Any help with understanding the pipeline and resolving this error would be appreciated.

@priyanagpal25
Copy link
Author

for an individual bam file: samtools view -H fastq_runid_e77a6fda925a5796b8b74964b42548a9fe2be7ec_6_1.bam | grep -E "MM:|ML:"
no output observed

@HalfPhoton
Copy link
Collaborator

Hi @priyanagpal25,

for an individual bam file: samtools view -H fastq_runid_e77a6fda925a5796b8b74964b42548a9fe2be7ec_6_1.bam | grep -E "MM:|ML:"

The issue here is that you have the samtools view -H flag set so the grep is only searching the header and not the read tags.

samtools view --help
... 
-H, --header-only          Print SAM header only (no alignments)

Can you check again that the bam file has mods tags?

Kind regards,
Rich

@HalfPhoton HalfPhoton added the mods For issues related to modified base calling label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mods For issues related to modified base calling
Projects
None yet
Development

No branches or pull requests

2 participants