Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Duplicate read ID - sga-bam2de.pl #136

Open
a-lud opened this issue Apr 6, 2017 · 2 comments
Open

Error: Duplicate read ID - sga-bam2de.pl #136

a-lud opened this issue Apr 6, 2017 · 2 comments

Comments

@a-lud
Copy link

a-lud commented Apr 6, 2017

Hi,

I'm trying to build scaffolds from three matepair libraries with 3kb, 5kb and 8kb inserts (currently in BAM format). The person who generated these libraries has followed all steps involved from the example scripts you have provided up to the scaffolding stage.

When running the sga-bam2de.pl function on the libraries, using the same settings as the "Scaffolding multiple libraries" wiki page, an error message similar to the one below is generated for each of the three files, with only the duplicate read ID being different.

abyss-fixmate -h KLS0691b.matepair.3kb.sorted.tmp.hist /localscratch/path/to/data/pe/KLS0691b.matepair.5kb.sorted.bam | samtools view -Sb - > KLS0691b.matepair.3kb.sorted.diffcontigs.bam
error: duplicate read ID `HWI-ST1408:124:CA3J7ANXX:4:1309:4959:8862/1'
[samopen] SAM header is present: 2455895 sequences.
[sam_read1] reference 'ID:bwa   PN:bwa  VN:0.7.13-r1126 CL:bwa mem -t 8 bwa_contigs1_index/index ../1_trimmed_AdapterRemoval/KLS0691b_5KB_GCCAAT_R1_t.fastq.gz ../1_trimmed_AdapterRemoval/KLS0691b_5KB_GCCAAT_R2_t.fastq.gz
contig-1172471  LN:289
@SQ     SN:contig-1223316       LN:242
@SQ     SN:contig-9458!' is recognized as '*'.
[main_samview] truncated file.
awk '$2 >= 3' KLS0691b.matepair.3kb.sorted.tmp.hist > KLS0691b.matepair.3kb.sorted.hist
awk: cmd. line:1: fatal: cannot open file `KLS0691b.matepair.3kb.sorted.tmp.hist' for reading (No such file or directory)
samtools sort KLS0691b.matepair.3kb.sorted.diffcontigs.bam KLS0691b.matepair.3kb.sorted.diffcontigs.sorted
DistanceEst -s 200 --mind -99 -n 5 -k 99 -j 1 -o KLS0691b.matepair.3kb.sorted.de KLS0691b.matepair.3kb.sorted.hist -l 100 KLS0691b.matepair.3kb.sorted.diffcontigs.sorted.bam
error: the histogram `KLS0691b.matepair.3kb.sorted.hist' is empty

It seems the duplicate read ID is what's triggering the error, however I am unsure how to go about solving this issue. Any help or insight would be appreciated.

Cheers

@brisk022
Copy link

brisk022 commented Jun 1, 2017

I have the same problem. I fixed it by removing all secondary or supplementary alignments from the bam files, e.g. samtools view -h -F 0x800 -o filtered.bam input.bam (or -F 0x100 if you used -S flag when aligning with BWA).

I didn't dig too deeply, but it seems to me the following is the problem. When only one of the reads has a secondary or supplementary alignment, abyss-fixmate thinks that it is a primary alignment and reports it as a duplicate.

You can try reporting it at https://github.com/bcgsc/abyss

@a-lud
Copy link
Author

a-lud commented Jun 1, 2017

I came across a similar solution in this google groups thread. It was the secondary/supplementary alignments causing the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants