MALVA with low coverage data #6

OliverPStuart · 2021-02-16T04:32:17Z

Hi there,

We'd like to try using MALVA on our own low-coverage WGS data (~1x). We've noticed that the MALVA release we're using (version 1.3.1; build h3889886_0) is only genotyping sites where a sample has >=2 coverage. Is there a way to modify the default behaviour to do this? There's nothing obvious in the provided flags but maybe it's possible to modify the original code.

mpre · 2021-02-16T14:20:48Z

Hi Oliver, as you correctly understood, MALVA filters out kmers occurring only once and considers them as errors. There's no easy way to avoid this using the version distributed through conda.

If you use the version available here on github you can edit line 107 of the MALVA bash script in the root directory and add the -ci1 flag after ${KMC_BIN}.

Please consider that MALVA relies on high coverage to call genotypes so the result you get after setting that flag to 1 might be inaccurate.

OliverPStuart · 2021-02-23T00:05:16Z

Thank you. I've given this a try and it does change the behaviour somewhat (i.e. the outputs are different), but there are no genotypes in the output called from low-frequency (n=1) k-mers. Is there anything in the design of MALVA that would create a case where a genotype is not called even when a k-mer is found that corresponds to it?

I appreciate that our use case is definitely not what MALVA was designed for (coverage and organism) so I'm interested to get a better handle on how MALVA operates so we can decide if it suits our project.

ldenti · 2021-02-23T09:47:00Z

Hi Oliver,
a quick question:

there are no genotypes in the output

do you mean the variants are called 0 instead of 1?

MALVA uses allele frequencies in the population and kmer coverages to compute the likelihood of each possible genotype of a variant and then assign the most likely one. It may be the case that the a priori probabilities used (ie by default the frequencies of each allele in the considered population) are forcing MALVA to call a variant 0 since the coverage for the alternate allele is not high enough.

Can you please send a variant from your input VCF file that has been miscalled by MALVA?

mpre added the question Further information is requested label Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MALVA with low coverage data #6

MALVA with low coverage data #6

OliverPStuart commented Feb 16, 2021

mpre commented Feb 16, 2021

OliverPStuart commented Feb 23, 2021

ldenti commented Feb 23, 2021 •

edited

Loading

MALVA with low coverage data #6

MALVA with low coverage data #6

Comments

OliverPStuart commented Feb 16, 2021

mpre commented Feb 16, 2021

OliverPStuart commented Feb 23, 2021

ldenti commented Feb 23, 2021 • edited Loading

ldenti commented Feb 23, 2021 •

edited

Loading