-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
12 merge sv candidates #14
Conversation
Tested CNV calling using Truvari with ONT long reads for alignment-based calls (see #11) and Illumina short reads from SNP-based CNV predictions. SNPs were called using Deepvariant v1.5.0 WGS model. Data is Illumina WGS 2x150bp 300X per individual from GIAB HG002. Recall is unchanged, but precision is slightly affected for both deletions and duplications. SV merging should improve this.
|
After implementing clipped base + read depth-based merging:
|
After running benchmarking only in high-confidence intervals: [NORMAL] Generating agglomerative clustering results [HIGH-CONF] Generating agglomerative clustering results [NORMAL] Generating DBSCAN clustering results [HIGH-CONF] Generating DBSCAN clustering results |
I have added some initial work on improving SV classification from split reads by removing some scenarios where duplications or deletions would not occur, including the endpoints of the overlap (deletion only), the endpoints of the alignments in an overlap (duplication only), gap endpoints (duplication or deletion), and gap alignment endpoints (duplication only).
FP count is significantly reduced, see #11 (comment) |
Completed whole-genome sequencing after threading bug fixes and with CNV data disabled for improved performance. Here are some metrics with 250G memory and 40 threads, cores:
Although wall clock time is ~4 hours, CPU time is much higher at ~6.5 days, thus it would take a long time without high multi-threading. |
Benchmark results using Truvari v3.5.0 with defaults, refdist=1000
|
Merge similar SV candidates.