Merging tree sequences from SLiM for neutral mutations and recapitation in msprime #187

elissasoroj · 2021-07-22T20:33:03Z

elissasoroj
Jul 22, 2021

Hello!

I would like to simulate populations with bifurcating population demography (e.g. a phylogeny) using SLiM for non-neutral dynamics and then msprime to overlay neutral mutations. For the sake of argument, let's just consider a single population that splits into two. Although we could model this demography in SLiM itself, the complexity of the model makes it a slow approach that would get even more unwieldy and complicated were the tree to have more tips. It seems a more efficient way to achieve this is to create an initial ancestry in msprime and then use that tree sequence as the starting point for two separate simulations in SLiM.

Ideally we would then want to merge the two tree sequences from the separate SLiM runs before using msprime to overlay neutral mutations and recapitate. However, it does seem tricky to somehow stitch together those tree sequences. Is this a feature of pyslim or msprime, or do you have any suggestions for how it might be done?

Thanks!
~Elissa

Answered by mufernando

Jul 28, 2021

Hi Elissa,

It is possible to "merge" tree sequences! This merge operation you are looking for is called union in tskit (also see this related vignette in pyslim). For two tree sequences in which part of the past history is shared, union works by copying the non-shared parts of one of the ts onto the other. As you realized, the trickiest part of this operation is defining the parts that are equivalent in the two tree sequences. For that, you will have to create an array that serves as a map of node ids between the two tree sequences.

Now, let's go through what you could do to simulate one ancestral population that splits into two:

Simulate the ancestral population and save the resulting t…

View full answer

mufernando · 2021-07-28T22:21:33Z

mufernando
Jul 28, 2021

Hi Elissa,

It is possible to "merge" tree sequences! This merge operation you are looking for is called union in tskit (also see this related vignette in pyslim). For two tree sequences in which part of the past history is shared, union works by copying the non-shared parts of one of the ts onto the other. As you realized, the trickiest part of this operation is defining the parts that are equivalent in the two tree sequences. For that, you will have to create an array that serves as a map of node ids between the two tree sequences.

Now, let's go through what you could do to simulate one ancestral population that splits into two:

Simulate the ancestral population and save the resulting tree sequence.
- If this is done with msprime, note that your ancestral population will already be fully coalesced, so there would be no reason to recapitate afterwards. But you could get some weirdness from simulating non-neutral stuff in the daughter pops only. You could instead do a burn-in ancestral population in SLiM for 2 to 10N generations, and recapitate afterwards.
- With SLiM, just note that you'd have to call sim.treeSeqRememberIndividuals right before you save the resulting tree sequence. Otherwise, if you start other simulations from this tree sequence your final generation individuals might be dropped in the simplification processes, and it would be "impossible" to determine the shared bits of the dauther tree sequences.
Start the simulation of your daughter populations independently in SLiM, by loading the ancestral population tree sequence.
Now you'd have two daughter population tree sequences which have some shared bit. To build the node mapping necessary for theunion operation, you could use this function below:

def match_nodes(other, ts, split_time):
"""
Given SLiM tree sequences `other` and `ts`, builds a numpy array with length 
`other.num_nodes` in which the indexes represent the node id in `other` and the 
entries represent the equivalent node id in `ts`. If a node in `other` has no 
equivalent in `ts`, then the entry takes the value `tskit.NULL`. The matching 
is done by comparing the IDs assigned by SLiM which are kept in the NodeTable
 metadata. Further, this matching of SLiM IDs is done for times (going 
 backward-in-time) greater than the specified `split_time`.
"""
    node_mapping = np.full(other.num_nodes, tskit.NULL)
    sids0 = np.array([n.metadata["slim_id"] for n in ts.nodes()])
    sids1 = np.array([n.metadata["slim_id"] for n in other.nodes()])
    alive_before_split1 = other.tables.nodes.time >= split_time
    sorted_ids0 = np.argsort(sids0)
    matches = np.searchsorted(
        sids0,
        sids1,
        side='left',
        sorter=sorted_ids0)
    is_1in0 = np.isin(sids1, sids0)
    both = np.logical_and(alive_before_split1, is_1in0)
    node_mapping[both] = sorted_ids0[matches[both]]
    return node_mapping

Finally you would be able to perform the union as specified in the tskit docs.

Even though union works for pairs of tree sequences, you could do something similar in n-1 iterations.

Hope this helps! Also don't hesitate to ask for clarifications if needed!

3 replies

petrelharp Jul 29, 2021
Maintainer

It'd be great to write a pyslim method that merges tree sequences in the appropriate way with a phylogeny - if you're interested in that, or at least helping figure out what the interface should be, then let's open an issue and discuss it?

mufernando Aug 3, 2021

See PR #190 for a more detailed solution to this problem!

elissasoroj Aug 4, 2021
Author

Thank you @mufernando! Very much appreciated.

@petrelharp - yes, I am definitely interested! This would be a great asset to some of the analyses I'm hoping to do with the SLiM+msprime workflow and I think a good addition to pyslim generally.

elissasoroj · 2021-08-04T14:35:32Z

elissasoroj
Aug 4, 2021
Author

For simplicity, can we get away with skipping the initial ancestral burn-in step in SLiM? This is an alternate workflow I came up with once I found out about union:

Run two SLiM simulations, which will be your daughter branches and save to out to .trees files
Read the tree sequences into python using pyslim
Use union to merge the tree sequences. They don't have any shared bit and we want the populations to stay separate so we set add_populations = True
Use MassMigration in msprime to create a demographic event that combines all the populations into one at the root, i.e. at SLiM generation 0. (MassMigration` is from the old version of msprime, so this step would need to be adjusted for the new msprime)
Recapitate the tree with the MassMigration demographic event. This should dump all the individuals from generation 0 of each SLiM simulation into a population together and recapitate from there.
Overlay neutral mutations

I think if you can avoid the ancestral burn in it would be a bit more straightforward when dealing with a larger phylogeny and should also save some space because you don't need sim.treeSeqRememberIndividuals.

def merge_trees(
    files:list = None, #only two files should be used, unless you want a polytomy
    simlength:int = None, #length in generations
    popsize:int = None, #size of each ending population
    recomb: float = None, #recombination rate
    mutrate:float = None, #mutation rate
    ):
    """
    Reads in two SLiM .trees files, merges them, recapitates, 
    overlays neutral mutations.
    """
    ids = []
    species = []

    #read in all the tree sequences
    for i in range(0,len(files)):
       ts = pyslim.load(files[i])
       species.append(ts)
    
    #merge the sequences
    merged_ts = pyslim.SlimTreeSequence(
            species[0].union(
            species[1], 
            node_mapping=[tskit.NULL for i in range(species[1].num_nodes)],
            add_populations=True,
            )
        )

    #add msprime demographic event 
    #(because SLiM adds populations with 0 individuals there are more than two 
    #populations here, which is why we have the for loop)
    demographic_events = []
    for i in range(1, merged_ts.num_populations):
        demographic_events.append(msprime.MassMigration(
            time = simlength, source = i, destination = 0,
            proportion = 1.0))

    pop_configs = [msprime.PopulationConfiguration(initial_size=popsize)
        for _ in range(merged_ts.num_populations)]

    matrix = np.zeros((merged_ts.num_populations, merged_ts.num_populations))

    #recapitate with demographic event
    rts = merged_ts.recapitate(
        population_configurations=pop_configs,
        demographic_events = demographic_events,
        migration_matrix= matrix,
        recombination_rate=recomb,
        random_seed=4,
    )

    #overlay neutral mutations
     mts = pyslim.SlimTreeSequence(msprime.mutate(rts, rate=mutrate, keep=True))

3 replies

petrelharp Aug 4, 2021
Maintainer

This makes sense for a two-species phylogeny, but I think with more than two species we still have to remember the individuals at the branching points?

elissasoroj Aug 4, 2021
Author

Ah yes, you're right - and the ancestral burn in is only necessary for the root so it is a much better solution to do it as @mufernando specified. I always struggle when thinking about switching from forward to backwards simulations 😅

mufernando Aug 4, 2021

one thing to consider also is that sometimes species splits are fairly recent, so you would probably want to do a burn-in with whatever SLiM model you have for 2-10N generations to get some ancestral genetic variation going.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging tree sequences from SLiM for neutral mutations and recapitation in msprime #187

{{title}}

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Merging tree sequences from SLiM for neutral mutations and recapitation in msprime #187

elissasoroj Jul 22, 2021

Replies: 2 comments · 6 replies

mufernando Jul 28, 2021

petrelharp Jul 29, 2021 Maintainer

mufernando Aug 3, 2021

elissasoroj Aug 4, 2021 Author

elissasoroj Aug 4, 2021 Author

petrelharp Aug 4, 2021 Maintainer

elissasoroj Aug 4, 2021 Author

mufernando Aug 4, 2021

elissasoroj
Jul 22, 2021

Replies: 2 comments 6 replies

mufernando
Jul 28, 2021

petrelharp Jul 29, 2021
Maintainer

elissasoroj Aug 4, 2021
Author

elissasoroj
Aug 4, 2021
Author

petrelharp Aug 4, 2021
Maintainer

elissasoroj Aug 4, 2021
Author