-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error "sorry, Dude. Didn't generate enough snps." #5
Comments
Hi and thank you for posting the issue! Is this using one of the slim recipes from the mapnn repo? Did you run slim without mutations, recapitate with msprime, and add mutations to the tree sequence? |
Thank you for your quick answer. I used the benchmark.slim that I just slightly modified to changed the image width and the number of generations. What is the procedure to add mutations if needed as I am not familiar with slim and msprime? |
We recommend a workflow where you do relatively few generations in slim (expensive) and then "finish" the simulation backwards in time with a coalescent model—called "recapitation"—and then you can add mutations to the completed tree sequence. This workflow is described in general here: https://tskit.dev/pyslim/docs/latest/tutorial.html The way I did this is using this function (mentioned briefly in the mapnn readme):
Did you use that |
Hi again! Thanks for your answer! I tried to run the recap function yesterday but it has been running since then while I have a small SNP dataset (loci number = 100), even though the geographic area is large (large part of the Western Indian Ocean). I checked in details and the function get stuck when running this row: self.Delta = self._create_incidence_matrix(). I also relaunched the SliM program with 50 generations (without the recap) but the preprocessing function of mapNN also fails: Traceback (most recent call last): Thank for your help in advance! |
On the one hand, it's normal for the slim step, and the recapitate to be slow, so the slow runtime by itself isn't necessarily a bug. But I'm confused about the detailed steps in your workflow. In particular you mentioned the SNP count in the file you are recapitating—that is different that the order I normally run things (slim -> recapitate -> add mutations). You need to recapitate to complete the simulation before running the preprocess step with mapnn. |
Hi again! Here is the order I followed: vcf2genos (to get the .genos file) -> create_maps to get a training map -> cookie cutter on the training map -> slim -> preprocessing (this step is possibly preceded by the recap but too slow for me). I try to recapitate the SliM file... |
Oops sorry I just noticed your comment about the line "self.Delta = self._create_incidence_matrix()". Will you please share (1) your recap() command, (2) the full error, and (3) and your tskit and msprime versions? Or is there no error and you're saying it continues running? Can you tell what package that line is in (e.g. maybe msprime)? It might be the case that your simulation just takes a super long time to run. |
Hi! I wasn't very clear in my previous message, but the problem with create_incidence_matrix concerns the function SpatialGraph that I was trying to use without the recap step. It outputs the error message I pasted in my previous message, which originates from create_incidence_matrix. Concerning the recap function, there is no error message but it just get stuck running (I stopped it after 4 days). Here is the code to launch the recap function (I have tskit v0.5.5 and msprime v1.2.0): Meanwhile, I could get the result I wanted with EEMS in about one day using one CPU (achieving satisfying convergence) so I think computation time is not the problem... |
seems very strange that a coalescent simulation would take > 4 days. how large is your chromosome? what is the associated recombination rate? |
Hi! Thank you for this new program! Could you explain me why I get the error "sorry, Dude. Didn't generate enough snps." during preprocessing?
The error occurs using the function
sample_ts
with the following code:geno_mat, locs = training_generator.sample_ts(trees[i], args.seed)
. The function runs for about 1h and then generates this error message. Here is the command line I am using below:python mapnn.py --preprocess --gpu_index any --seed 123 --num_snps 100 --n 49 --map_width 649 --slim_width 649 --sample_grid 1 --tree_list ../ADLIFISH_eruiz/test1_mapNN_chr_t_top_100_snp/preprocessing_mapNN/sim_mapNN/chr_t_top_100_sim_123_tree_list.txt --target_list ../ADLIFISH_eruiz/test1_mapNN_chr_t_top_100_snp/preprocessing_mapNN/cookie_mapNN/chr_t_top_100_cookie_list.txt --habitat_map ../ADLIFISH_eruiz/test1_mapNN_chr_t_top_100_snp/data_mapNN/depth_map_wio_rgb_cropped.png --empirical ../ADLIFISH_eruiz/test1_mapNN_chr_t_top_100_snp/data_mapNN/Chr_t_unlinked_snp_subset_top_100_loci_dapc_fixed_mapnn --out ../ADLIFISH_eruiz/test1_mapNN_chr_t_top_100_snp/preprocessing_mapNN/preprocess_mapNN
The trees were generated with 2 SliM iterations (for the test as I have a big region):
defineConstant("W", 649); defineConstant("L", 4); defineConstant("G", 1e8); defineConstant("FECUN", 1/L); defineConstant("maxgens", 2);
Thank you in advance, ER
The text was updated successfully, but these errors were encountered: