Skip to content

CPU 9. The Tasking Subsystem

Phillip Allen Lane edited this page Nov 27, 2023 · 1 revision

Most of the content covered thus far has been data parallelism. Data parallelism is the primary focus of DotMP, but we do provide a (currently limited) tasking subsystem for task parallelism.

Tasks are based on the premise that we may have a bunch of different work items to do, but individually, each work item is so lightweight that spawning a new thread per work item ends up having too much overhead. Creating threads is expensive, but creating tasks is cheap. Therefore, tasking is implemented under the existing fork-join paradigm: create a parallel region, then spawn tasks from within the region.

You may be wondering why we don't just use the TPL for this. After all, it's built into C#, and it literally stands for the task parallel library. Well, DotMP's tasking subsystem has a couple nice features that are notably absent from the TPL. We'll get to these in a little bit.

Overview of Tasking Semantics

The basic control flow of tasking is as follows: when tasks are created, they are deferred for execution until a tasking point. The task graph keeps track of all of the created tasks and tracks which tasks are eligible for execution. This data structure represents a DAG, or a directed acyclic graph, because tasks in DotMP are permitted to include dependencies.

Each time you create a task, it returns a TaskUUID object. You, the user, are not permitted to do anything with this object except pass it to other tasks, marking those tasks as dependent on the task that created the object. Tasks can be dependent on as many other tasks as you like, which greatly increases flexibility compared to the .NET TPL.

Back to tasking points. There are currently two tasking points in DotMP, at which point execution stalls until the task graph is empty (i.e., all deferred tasks have been executed). Those tasking points are:

  • Upon reaching a DotMP.Parallel.Taskwait() call.
  • Upon reaching the end of a parallel region.

The latter is pretty simple. Before a parallel region joins back into a single thread of execution, all deferred tasks are executed. The former will be explained in a little bit after we talk about how to create tasks.

Creating Tasks

To create a task is extremely simple:

DotMP.Parallel.ParallelMaster(() =>
{
    DotMP.Parallel.Task(() =>
    {
        Console.WriteLine("Hello from a task!");
    });
});

You will notice two new constructs here, created specifically for tasking. The first is ParallelMaster, which is simply a wrapper around ParallelRegion and Master. We do this because we don't want every single thread launching tasks; usually we want to reserve task creation for the master thread (although there are exceptions). The second is the Task method, which is remarkably simple. It takes a delegate with no parameters and treats that as a task for execution.

As discussed earlier, we can spawn numerous tasks and declare dependencies between them. For example, we could do something like this:

DotMP.Parallel.ParallelMaster(() =>
{
    var t1 = DotMP.Parallel.Task(() => 
    {
        Console.WriteLine("Task 1.");
    });

    var t2 = DotMP.Parallel.Task(() =>
    {
        Console.WriteLine("Task 2.");
    }, t1);

    var t3 = DotMP.Parallel.Task(() =>
    {
        Console.WriteLine("Task 3.");
    }, t1);

    var t4 = DotMP.Parallel.Task(() =>
    {
        Console.WriteLine("Task 4.");
    }, t2, t3);
});

This creates four tasks (labeled t1, t2, t3, and t4). We mark t2 and t3 as dependent on t1, and then we mark t4 as dependent on t2 and t3. In a parallel region, the tasking system is forced to dispatch t1 first and wait for its completion before any further tasks can be dispatched. Once t1 is complete, t2 and t3 are allowed to execute simultaneously. Finally, once both t2 and t3 are completed, t4 is permitted to be dispatched.

We can see this if we run the program, there are only two possible outputs (as opposed to 24 if there were no dependencies!). These two outputs are:

Task 1.
Task 2.
Task 3.
Task 4.

and

Task 1.
Task 3.
Task 2.
Task 4.

Tada! We successfully created four graphs in a dependency DAG, and the DotMP tasking scheduler ensures that tasks' dependencies are met. If you want to pass dependencies out-of-order in the method call (e.g., passing them before the action), you can use the labeled parameter depends.

Tasks themselves can also spawn other tasks. For example, we can do something like this:

DotMP.Parallel.ParallelMaster(() =>
{
    DotMP.Parallel.Task(() =>
    {
        Console.WriteLine("Outer task.");

        DotMP.Parallel.Task(() =>
        {
            Console.WriteLine("Inner task.");
        });
    });
});

Here, the outer task is created and placed into the task graph. The parallel region concludes, a tasking point is hit, and the outer task is executed. It prints Outer task., then itself spawns another task. At this point, the inner task is marked eligible for execution, and is executed. We can modify the program by also printing the thread number to see that the outer task and inner task are being executed by different threads. We also know that, because the outer task prints before the inner task is created, there is a definite order to our print statements.

Taskwait

The Taskwait() method needs some important clarification I've avoided until now. Taskwait will act as a barrier if called in parallel space (i.e., not from within a task). If taskwait is called from within a task, it won't act as a barrier (so you won't deadlock), but you are required to pass a set of tasks to wait on, which, in most cases, should be the set of tasks spawned from within the task.

For example, the following code will work as expected:

DotMP.Parallel.ParallelMaster(() =>
{
    DotMP.Parallel.Task(() =>
    {
        Console.WriteLine("This is printed first.");

        var child1 = DotMP.Parallel.Task(() =>
        {
            Console.WriteLine("This is printed either second or third.");
        });

        var child2 = DotMP.Parallel.Task(() =>
        {
            Console.WriteLine("This is printed either second or third.");
        });

        DotMP.Parallel.Taskwait(child1, child2);

        Console.WriteLine("This is printed fourth.");
    });
});

But if the DotMP.Parallel.Taskwait(child1, child2); is replaced with DotMP.Parallel.Taskwait();, an exception is thrown!

Taskloops

The most similar thing that DotMP has to the TPL's System.Threading.Tasks.Parallel.For is DotMP.Parallel.Taskloop. Taskloops are a form of parallel-for, but unlike those covered in chapter 2 of this guide, do not use the traditional schedulers. Taskloops work by converting a loop into a number of tasks and throwing them into the tasking graph. The task scheduler makes no distinction between tasks created with Task vs. Taskloop, and they are treated identically. The crucial difference is that Task creates only a single task, while Taskloop creates a lot of tasks.

A simple example of a SAXPY using taskloops looks like this:

DotMP.Parallel.ParallelMaster(() =>
{
    DotMP.Parallel.Taskloop(0, M, i =>
    {
        y[i] += a * x[i];
    });
});

Taskloops support a number of optional parameters to control their execution. Two of them are used to control how many tasks are spawned (analogous to chunk_size for traditional worksharing parallel-for loops). These are:

  • num_tasks - allows you to explicitly declare the number of tasks that should be created
  • grainsize - specifies how many iterations should be created per task

You can specify either num_tasks or grainsize. If both are specified, num_tasks takes precedence over grainsize (i.e., grainsize is ignored).

Taskloops also support dependencies, much like regular tasks. There's the depends labeled parameter, or you can just dump them after the action like the earlier example with Task(). Taskloops themselves can be used as a dependency. However, instead of returning just a single TaskUUID object, it returns a TaskUUID[] array: one element per task created.

So, if you want to execute two parallel-for loops back to back, with one loop being completed before the other begins, there are two primary ways to do this. You can do:

DotMP.Parallel.ParallelRegion(() =>
{
    DotMP.Parallel.MasterTaskloop(0, M, i =>
    {
        do_first_work(i);
    });

    DotMP.Parallel.Taskwait();

    DotMP.Parallel.MasterTaskloop(0, M, i =>
    {
        do_second_work(i);
    });

    DotMP.Parallel.Taskwait();
});

This approach dispatches the first set of tasks, calls taskwait, dispatches the second set of tasks, then calls taskwait again. You may notice a new construct: MasterTaskloop. It is simply a wrapper around Master and Taskloop. There is technically also ParallelMasterTaskloop, a wrapper around ParallelRegion, Master, and Taskloop.

Or, using dependencies:

DotMP.Parallel.ParallelMaster(() =>
{
    var tl = DotMP.Parallel.Taskloop(0, M, i =>
    {
        do_first_work(i);
    });

    DotMP.Parallel.Taskloop(0, M, i =>
    {
        do_second_work(i);
    }, tl);
});

This creates all of the tasks at one time, marking the tasks created by the second taskloop as dependent on all of the tasks of the first taskloop.

Finally, there's the only_if parameter. If the bounds of a loop are flexible and can change between runs, then sometimes you encounter a case where the overhead of creating and scheduling tasks in parallel outweighs any possible performance gains, and a loop would be better run in serial. only_if is the parameter that does that. If only_if is set to true, then tasks are created and thrown into the task graph as normal. If only_if is false, then the loop bypasses the tasking subsystem entirely and is simply executed in serial. When this happens, task dependencies and traditional tasking behaviors are ignored. The loop is executed eagerly before returning from the Taskloop call.