Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to obtain the RS ID from Minimac4 results #34

Open
xiangboyulan opened this issue Jun 6, 2020 · 17 comments
Open

How to obtain the RS ID from Minimac4 results #34

xiangboyulan opened this issue Jun 6, 2020 · 17 comments

Comments

@xiangboyulan
Copy link

Hi,

How to obtain the RS ID from Minimac4 results? Thanks a lot!

Best,

Bo

@jonathonl
Copy link
Contributor

You need to add --rsid when running minimac4. The reference panel also needs to have RS IDs in the ID column.

@xiangboyulan
Copy link
Author

Hi Jonathon,
I added --rsid ON when running minimac4 and used the reference panel from your download website. But I can't find the RS IDs from M3VCF file. Could you give me some advice? Thanks a lot!

@jonathonl
Copy link
Contributor

I'm assuming you are referring to the 1000 genomes panel. This panel does not have RS IDs. You should be able to get them from the 1000 genomes VCFs on the 1000 genomes FTP site.

@xiangboyulan
Copy link
Author

I used the Minimac3 to Convert 1000 genomes panel VCF to M3VCF, I can not find the RS IDs, only obtain Chr:pos as SNP

@jonathonl
Copy link
Contributor

If the RS IDs exist in the VCF but not the M3VCF, then I would suggest using https://github.com/Santy-8128/m3vcftools to compress to M3VCF. This tool will copy over the ID column. I don't know whether the VCFs on our site include RS IDs, but the VCFs on the 1000 genomes site do.

@xiangboyulan
Copy link
Author

Hi,
I used the --referenceEstimates OFF, but it still work for ON

@steffenom
Copy link

Hi all!
First of all, thanks for publishing your code on github!

We have a similar problem with missing IDs in the imputation output. We use minimac3 v.2.0.1 with the --rsid option to convert our panel with custom IDs from vcf to m3vcf. The resulting file still contains the IDs. We then convert the m3vcf file to msav format with minimac4 --update-m3vcf and run the imputation, but the output is missing the IDs.

We checked the msav file with the sav export command from savvy and there where no IDs in it. Any idea why the IDs get lost when converting from m3vcf to msav? We tried passing the --rsid option to minimac4, but it had no effect (and it is marked as deprecated). If I understood the previous discussion correctly, the IDs should be passed on.

@jonathonl
Copy link
Contributor

@steffenom, thanks for reporting this. The earlier conversation was regarding v4.0.x. You are using v4.1.x, and this feature was missing from the new version. I just pushed a fix to the master branch. Please try the latest from the master branch to generate a new msav file.

@steffenom
Copy link

Hi @jonathonl,
I tried the newest version on the master branch and it worked! The IDs showed up as expected. Thanks for the quick fix!

Minor drawback is that now all variants have an ID. The ones that don't have an ID in the reference panel now have an ID given by CHR:POS, but that is not a problem for us. Might be unexpected for other users.

@jonathonl
Copy link
Contributor

For the IDs with CHR:POS, are these variants that exist only in the target file (not in reference)? If using --all-typed-sites, IDs for such variants are carried over from the target VCF instead of the reference panel. If the variant exists in the reference panel and has a missing ID in the reference panel, then the ID for that variant should also be missing in the imputed results.

@steffenom
Copy link

No, for me all variants without an ID in the initial reference panel have the CHR:POS ID in the final output (without using --all-typed-sites).
I think, these IDs are already create when creating the m3vcf-file from the reference panel with minimac3 and then minimac4 --update-m3vcf just takes them over.

@jonathonl
Copy link
Contributor

I see. FYI, you can generate an msav directly from a VCF, BCF, or SAV file with minimac4 --compress-reference input.vcf.gz -o compressed_output.msav. This still needs to be documented in the --help and README.

@steffenom
Copy link

Thanks for the hint! I tried minimac4 --compress-reference and now the IDs are as expected.

Should the results of the imputation with a reference panel created with minimac4 --compress-reference be similar to results for the same panel created with minimac3 --processReference + minimac4 --update-m3vcf? Or is one preferred over the other in certain situations?

@jonathonl
Copy link
Contributor

There may be a small difference with smaller reference panels (tens of thousands of samples). By default, minimac3 --processReference does parameter estimation and saves those parameters in the m3vcf. This parameter estimation will be less useful for larger panels.

@buegelbeatz
Copy link

buegelbeatz commented Dec 7, 2022

minimac4 --compress-reference input.vcf.gz -o compressed_output.msav somehow kicked me out with:

minimac v4.1.0

Error: Cannot write empty block
Error: serializing final block failed

input.vcf.gz has 1052764 chromosome 20 variants (rows).
The file has 915 columns, converting to m3vcf with minimac3 works.

The code line where I'm kicked out is:

return std::cerr << "Error: Cannot write empty block\n", false;

It also failed for 4, 14, 15 - all other chromosomes works.

@jonathonl
Copy link
Contributor

Error: Cannot write empty block
Error: serializing final block failed

@buegelbeatz , this should be fixed with 6f9f140

@buegelbeatz
Copy link

Error: Cannot write empty block
Error: serializing final block failed

@buegelbeatz , this should be fixed with 6f9f140

Just tested it - works now! - Thanx for the quick fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants