Merging tree sequences from SLiM for neutral mutations and recapitation in msprime #187
-
Hello! I would like to simulate populations with bifurcating population demography (e.g. a phylogeny) using SLiM for non-neutral dynamics and then msprime to overlay neutral mutations. For the sake of argument, let's just consider a single population that splits into two. Although we could model this demography in SLiM itself, the complexity of the model makes it a slow approach that would get even more unwieldy and complicated were the tree to have more tips. It seems a more efficient way to achieve this is to create an initial ancestry in msprime and then use that tree sequence as the starting point for two separate simulations in SLiM. Ideally we would then want to merge the two tree sequences from the separate SLiM runs before using msprime to overlay neutral mutations and recapitate. However, it does seem tricky to somehow stitch together those tree sequences. Is this a feature of pyslim or msprime, or do you have any suggestions for how it might be done? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
Hi Elissa, It is possible to "merge" tree sequences! This merge operation you are looking for is called union in tskit (also see this related vignette in pyslim). For two tree sequences in which part of the past history is shared, union works by copying the non-shared parts of one of the ts onto the other. As you realized, the trickiest part of this operation is defining the parts that are equivalent in the two tree sequences. For that, you will have to create an array that serves as a map of node ids between the two tree sequences. Now, let's go through what you could do to simulate one ancestral population that splits into two:
Even though union works for pairs of tree sequences, you could do something similar in n-1 iterations. Hope this helps! Also don't hesitate to ask for clarifications if needed! |
Beta Was this translation helpful? Give feedback.
-
For simplicity, can we get away with skipping the initial ancestral burn-in step in SLiM? This is an alternate workflow I came up with once I found out about
I think if you can avoid the ancestral burn in it would be a bit more straightforward when dealing with a larger phylogeny and should also save some space because you don't need def merge_trees(
files:list = None, #only two files should be used, unless you want a polytomy
simlength:int = None, #length in generations
popsize:int = None, #size of each ending population
recomb: float = None, #recombination rate
mutrate:float = None, #mutation rate
):
"""
Reads in two SLiM .trees files, merges them, recapitates,
overlays neutral mutations.
"""
ids = []
species = []
#read in all the tree sequences
for i in range(0,len(files)):
ts = pyslim.load(files[i])
species.append(ts)
#merge the sequences
merged_ts = pyslim.SlimTreeSequence(
species[0].union(
species[1],
node_mapping=[tskit.NULL for i in range(species[1].num_nodes)],
add_populations=True,
)
)
#add msprime demographic event
#(because SLiM adds populations with 0 individuals there are more than two
#populations here, which is why we have the for loop)
demographic_events = []
for i in range(1, merged_ts.num_populations):
demographic_events.append(msprime.MassMigration(
time = simlength, source = i, destination = 0,
proportion = 1.0))
pop_configs = [msprime.PopulationConfiguration(initial_size=popsize)
for _ in range(merged_ts.num_populations)]
matrix = np.zeros((merged_ts.num_populations, merged_ts.num_populations))
#recapitate with demographic event
rts = merged_ts.recapitate(
population_configurations=pop_configs,
demographic_events = demographic_events,
migration_matrix= matrix,
recombination_rate=recomb,
random_seed=4,
)
#overlay neutral mutations
mts = pyslim.SlimTreeSequence(msprime.mutate(rts, rate=mutrate, keep=True)) |
Beta Was this translation helpful? Give feedback.
Hi Elissa,
It is possible to "merge" tree sequences! This merge operation you are looking for is called union in tskit (also see this related vignette in pyslim). For two tree sequences in which part of the past history is shared, union works by copying the non-shared parts of one of the ts onto the other. As you realized, the trickiest part of this operation is defining the parts that are equivalent in the two tree sequences. For that, you will have to create an array that serves as a map of node ids between the two tree sequences.
Now, let's go through what you could do to simulate one ancestral population that splits into two: