Skip to content

Maintenance release

Compare
Choose a tag to compare
@root-11 root-11 released this 20 May 15:04
· 72 commits to master since this release

This maintenance release delivers a new algorithm for the detection of phaselines in graphs which reduces the runtime from O(V*E) to O(V+E).

From the docs string:

Detecting phaselines is useful for determining which tasks can be performed in parallel.
Each phase in the phaselines must be completed to assure that the tasks in the next phase can be performed with complete input.

This is in contrast to Topological sort that only generates a queue of tasks, may be fine for a single processor, but has no mechanism for coordination that all inputs for a task have been completed so that multiple processors can work on them.

Here is an example of a DAG with tasks:

        u1      u4      u2      u3
        \       \       \_______\
        csg     cs3       append
        \       \           \
        op1     \           op3
        \       \           \
        op2     \           cs2
        \       \___________\
        cs1         join
        \           \
        map1        map2
        \___________\
            save

The phaselines would be:

    phaselines = {
        "u1": 0, "u4": 0, "u2": 0, "u3": 0, 
        "csg": 1, "cs3": 1, "append": 1,
        "op1": 2, "op3": 2, 
        "op2": 3, "cs2": 3,
        "cs1": 4, "join": 4,
        "map1": 5, "map2": 5,
        "save": 6,
    }  

From this example it is visible that processing the 4 'uN' (uploads) is the highest degree of concurrency. This can be determined as follows:

        d = defaultdict(int)
        for _, pl in graph.phaselines():
            d[pl] += 1
        max_processors = max(d, key=d.get)