-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New minimac4 release v4.1.0 produces vcf.gz which can't be read by DosageConvertor #53
Comments
I'm surprised that DosageConverter doesn't support HDS fields (which is now the default format field for Minimac v4.1), but you can get around this by adding |
Great, adding this flag does let DosageConvertor read the file now! I have another problem though, the dosage file that minimac4 produces now (with that flag -f DS) does not export the ID field, they are all set as '.' which causes problems for me later in the analysis. Is there a way to force that to be exported - it used to set the ID to chr:pos:alt:ref etc? Thanks again |
Can you elaborate on the use case for your downstream analysis. This information is redundant with what is already in the VCF records. Note: Minimac v4.1 uses the ID column from the reference panel in the imputed results. |
Yes of course.
Now v4.1.0:
Looking here you can see that minimac previously used to fill in the ID column in the vcf. This is needed for me because when I use DosageConvertor to convert the dosage file to plink format, it uses the ID column from the vcf to create the plink map file with the marker IDs, and so now all the IDs in the map file (2nd column below) are empty ('.') (DosageConvertor produces a .plink.dosage, .plink.fam and .plink.map file)
I am unable to use the plink files due to not having unique variant identifiers. |
I forked DosageCoverter and made a fix for plink conversions. Please try https://github.com/jonathonl/DosageConvertor. If this works for you, I will create a pull request in the main repo. |
Thanks for responding so quickly - I can confirm the changes you made to DosageConvertor work and I am able to create valid plink files, so creating a pull request ought to be useful to others as well. I've been looking into the results that minimac v4.1 is generating compared to the previous version though (with the same input data), and I can't understand it but the new version is producing a dosage vcf where many snp alleles are inconsistent with the reference data. The majority of alleles in the dosage file are different, such that in later stages of my analysis when I filter snps which match the reference, hardly anything matches and my results are thus very different/incorrect. I can't say for sure without spending a lot more time on this (which I don't have at the moment) but often one or both of ref and alt are flipped/complementary in the new minimac version. So at the moment I am going to have to stay with the previous minimac version unless you have any good ideas why this is happening/what I should look for? Thanks again |
I'm very skeptical that Minimac4.1 is changing the reference allele. If this is happening, it is likely happening somewhere else in your pipeline. But if you can provide an example, I'll look into it. |
Yes I agree it seems unlikely, but I've put together some files here for you to reproduce it, using the same input file and references. My pipeline runs on Centos7, so I've included 2 Dockerfiles which show how I build minimac v1.0.3 and v4.1. It should be shared here: https://drive.google.com/file/d/1Zy01T1f4DzcaQVDjq4wan80rrXRPN4Y3/view?usp=sharing Create a directory and extract the files, then build the docker images:
Run the docker container, bind mounting the current directory with the extracted files, and call minimac v1.0.3:
Exit and run the other docker container, bind mounting the current directory with the extracted files, and call minimac v4.1:
If you now exit and compare 23andme_chr13_minimac1_0_3.dose.vcf.gz and 23andme_chr13_minimac4_1.dose.vcf.gz in the current directory, you can see the differences in ref/alt. I generated the .msav files as instructed, using
If you could take a look and tell me if there is something wrong with how I have built minimac/libraries or the msav reference that would be great. |
Ok, I will check this out sometime this week. |
Can you check the ref / alt in the reference panels ? Minimac should copy
over the values from the panels provided. Maybe there are different?
…On Tue, Oct 25, 2022, 8:10 PM Jonathon LeFaive ***@***.***> wrote:
Ok, I will check this out sometime this week.
—
Reply to this email directly, view it on GitHub
<#53 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5YQCEBZ5UHKRTIVRQUPJTWFCOIVANCNFSM6AAAAAARIWYFTE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
The .msav reference files are not text files, so I'm unable to check whether the .m3vcf.gz reference files are converted correctly? |
Ahh Jonathon might be able to help with that.
…On Tue, Oct 25, 2022, 10:12 PM sereeena ***@***.***> wrote:
The .msav reference files are not text files, so I'm unable to check
whether the .m3vcf.gz reference files are converted correctly?
—
Reply to this email directly, view it on GitHub
<#53 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5YQCDYABFKPWJYGWQBEVDWFC4VBANCNFSM6AAAAAARIWYFTE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Your input target genotypes have the alleles switched when comparing against all 3 reference panels (see below). I'm guessing that v4.0.3 is allowing for this (maybe intentionally) (@Santy-8128 might be able to comment) whereas v4.1.0 is expecting the alleles to exactly match what is in the reference.
|
Thanks for helping with this! So it looks like there was always something wrong with our pipeline and that minimac 1.0.3 may have taken the alleles from the references while minimac 4.1 is taking it from the eagle input file, where alleles have been flipped? I suspect it has something to do with plink, I have a bed/bim/fam where everything looks correct, then I convert to vcf with plink (which then goes on to eagle as a bcf, then eagle output goes to minimac). It is in the conversion to vcf where although I use --keep-allele-order with plink, it is still flipping them in the output vcf. So that will take time to investigate, but as this is not a minimac issue you can go ahead and close it, thanks again. |
Previously, we were able to use minimac dosage vcf files with DosageConvertor, to convert output files to a plink dosage format. However, when using the new minimac4 version to produce outputs in vcf.gz format, DosageConvertor can no longer read these files properly and reports the following error : Tag DS does not exist for . (See full trace below)
Here is the example of how I call minimac4.1 and DosageConvertor now (using DosageConvertor built from the latest codebase):
minimac4
chr18.1000g.Phase3.v5.With.Parameter.Estimates.msav
23andme_chr18_eagle.vcf.gz
-o 23andme_chr18_minimac4.dose.vcf.gz
--output-format vcf.gz
DosageConvertor --vcfDos 23andme_chr18_minimac4.dose.vcf.gz
--prefix 23andme_chr18
--type plink
--format 1
--tag DS
Previously, I called minimac like this:
minimac4
--refHaps chr18.1000g.Phase3.v5.With.Parameter.Estimates.m3vcf.gz
--haps 23andme_chr18_eagle.vcf.gz
--prefix 23andme_chr18_minimac4
And here is the full log excerpt from the DosageConvertor part:
Is there a problem with the dosage file that the new minimac version creates? I also created an issue in the DosageConvertor codebase in case it is something which needs to be changed there. Thanks for your assistance.
The text was updated successfully, but these errors were encountered: