Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MALVA with low coverage data #6

Open
OliverPStuart opened this issue Feb 16, 2021 · 3 comments
Open

MALVA with low coverage data #6

OliverPStuart opened this issue Feb 16, 2021 · 3 comments
Labels
question Further information is requested

Comments

@OliverPStuart
Copy link

Hi there,

We'd like to try using MALVA on our own low-coverage WGS data (~1x). We've noticed that the MALVA release we're using (version 1.3.1; build h3889886_0) is only genotyping sites where a sample has >=2 coverage. Is there a way to modify the default behaviour to do this? There's nothing obvious in the provided flags but maybe it's possible to modify the original code.

@mpre
Copy link
Member

mpre commented Feb 16, 2021

Hi Oliver, as you correctly understood, MALVA filters out kmers occurring only once and considers them as errors. There's no easy way to avoid this using the version distributed through conda.

If you use the version available here on github you can edit line 107 of the MALVA bash script in the root directory and add the -ci1 flag after ${KMC_BIN}.

Please consider that MALVA relies on high coverage to call genotypes so the result you get after setting that flag to 1 might be inaccurate.

@mpre mpre added the question Further information is requested label Feb 16, 2021
@OliverPStuart
Copy link
Author

Thank you. I've given this a try and it does change the behaviour somewhat (i.e. the outputs are different), but there are no genotypes in the output called from low-frequency (n=1) k-mers. Is there anything in the design of MALVA that would create a case where a genotype is not called even when a k-mer is found that corresponds to it?

I appreciate that our use case is definitely not what MALVA was designed for (coverage and organism) so I'm interested to get a better handle on how MALVA operates so we can decide if it suits our project.

@ldenti
Copy link
Member

ldenti commented Feb 23, 2021

Hi Oliver,
a quick question:

there are no genotypes in the output

do you mean the variants are called 0 instead of 1?

MALVA uses allele frequencies in the population and kmer coverages to compute the likelihood of each possible genotype of a variant and then assign the most likely one. It may be the case that the a priori probabilities used (ie by default the frequencies of each allele in the considered population) are forcing MALVA to call a variant 0 since the coverage for the alternate allele is not high enough.

Can you please send a variant from your input VCF file that has been miscalled by MALVA?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants