Fix align_trim to deal with long deletion containing left primer #111
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Recently we had an issue with artic generating consensus sequences with significant of abnormal SNPs called at ORF8 region.
Double-checking the alignment, we found that there's an extremely long deletion of 197nt, from 27984-28180 inclusively, that fully contains the SARS-CoV-2_94_LEFT primer v4.1 (27996-28021). The top track in the Figure below present the true alignment using minimap2 (sorted.bam).
The align_trim doesn't seem to cover this situation well, resulted in wrong CIGAR after the soft masking. The misalignment then created the false SNPs in this region as shown in the middle track of the Figure (_false.trimmed.rg.sorted.bam)
The fix is simple when I only tried to cover this bug, not attempting to call the long SNPs correctly (also understand that D can't appear right after S in valid cigar string). So after trimming in-between primer 94 pairs (highlighted in red) from start using the fixed version, the deletion's gone too (bottom track). Please double-check as I'm not fully aware of other situations when this change may affect the trimming in an unexpected way.
I'm using the artic-1.3.0-dev branch but the same issue also found on the master branch so you might need to apply the fix as well if applicable. Of course if there's a way to capture the deletion in the consensus sequence that'd be much better.
@mjsull @nickloman @BioWilko @will-rowe
Thanks,