Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record mismatch in temp files when -a option is active #77

Open
edg1983 opened this issue Jul 30, 2024 · 7 comments
Open

record mismatch in temp files when -a option is active #77

edg1983 opened this issue Jul 30, 2024 · 7 comments

Comments

@edg1983
Copy link

edg1983 commented Jul 30, 2024

Hi,

I'm using minimac4 v4.1.3 to impute genotypes on a cohort of about 24k individuals.

Usually, I run one imputation job per chromosome. When I run the command with mostly default settings, it works fine and generates an output vcf.gz file containing the same number of variants of the reference panel file (as expected). See this example:

minimac4 -t 12 -b 500 \
	-f GT,DS,HDS,GP \
	-o chr22.imputed.vcf.gz \
	chr22.refpanel.msav chr22.target.vcf.gz

However, if I add the -a option, I have an error merging the temp files at the end of the imputation process. The resulting vcf.gz file is truncated and contains fewer variants than those in the reference panel.

This is the command

minimac4 -t 12 -b 500 -a \
	-f GT,DS,HDS,GP \
	-o chr22.imputed.vcf.gz \
	chr22.refpanel.msav chr22.target.vcf.gz

Here is the error from the log

Running HMM took 114 seconds
Writing temp files took 89 seconds
Merging temp files ...
Error: record mismatch in temp files
Error: failed merging temp files

Am I doing something wrong here? Thanks!

@jonathonl
Copy link
Contributor

Can you check to see if you have enough disk space in /tmp to store the chunked results? I think we would have seen an error message earlier in the logs if an error occurred writing the temp files, but that's the only good explanation I have for why this would happen.

Otherwise, is there anything special about the variant immediately after the last one written to output file? What operating system are you running this on and how did you install Minimac4?

@edg1983
Copy link
Author

edg1983 commented Jul 30, 2024

Hi,

I don't think the issue is related to storage space. I see in the log files a message like Writing temp files took 344 seconds; hence, I assume that all temp files were written correctly.

I currently use Minimac4 on our HPC cluster, which runs on CentOS 8. We grabbed the pre-compiled executable provided with the release on GitHub. It has worked fine so far in all other tests; it is just the -a option that creates issues apparently.

I'll check if I see anything strange in the last variant written to the file and the next one in the imputation ref panel.

@jonathonl
Copy link
Contributor

Ok, if there is something strange, I'm guessing it will be in the next variant in your target VCF (as opposed to the reference VCF).

@jonathonl
Copy link
Contributor

I'm guessing that this is happening because there is target-only variant that has all of the genotypes missing for a batch of samples. This is a bug that I'll need to fix, though phasing software should impute such genotypes. Are you phasing your target vcf before imputing?

@edg1983
Copy link
Author

edg1983 commented Aug 9, 2024

Hi, I'm imputing VCF files from genotyping directly after QC without phasing them.

I'm now re-running the test with -a option to check on the last written variant and the next one in the input VCF. I'll update you here as soon as this is done.

@jonathonl
Copy link
Contributor

You will get very poor imputation results if you impute unphased genotypes (or if you impute with an unphased reference panel). Both input files should be phased.

@edg1983
Copy link
Author

edg1983 commented Aug 23, 2024

I've tried with imputed genotypes, and I confirm this works fine with the -a option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants