Executor 2.0: Stream assignment #5602

mzient · 2024-08-08T17:58:56Z

Category:

New feature (non-breaking change which adds functionality)

Description:

Executor 2.0 doesn't have "stages" and streams are assigned according to stream assignment policy.

Stream assignment policies

Executor2 assigns streams to operator nodes based on the StreamPolicy configuration parameter. There are three stream policies:

Single - there's just one stream used by all operators which need one.
PerBackend - there's a distinct stream per backend (GPU, Mixed)
PerOperator - operators which can execute independently are assigned distinct streams - but operators with sequential dependency use the same stream - NOT IMPLEMENTED IN THIS PR

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-4030

Signed-off-by: Michał Zientkiewicz <[email protected]>

dali-automaton · 2024-09-02T10:08:54Z

CI MESSAGE: [18059205]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <[email protected]>

dali-automaton · 2024-09-02T10:54:21Z

CI MESSAGE: [18060037]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <[email protected]>

stiepan · 2024-09-02T11:20:13Z

dali/pipeline/executor/executor2/stream_assignment.h

+
+      queue_.pop();
+      auto *node = sorted_nodes_[idx];
+      // This will be true for nodes which has no outputs or which doesn't contribute to any


Suggested change

// This will be true for nodes which has no outputs or which doesn't contribute to any

// This will be true for a node which has no outputs or which doesn't contribute to any

Went the other way.

stiepan · 2024-09-02T12:19:40Z

dali/pipeline/executor/executor2/stream_assignment.h

+  void Assign(ExecGraph &graph) {
+    // pre-fill the id pool with sequential numbers
+    for (int i = 0, n = graph.Nodes().size(); i < n; i++) {
+      free_stream_ids_.insert(i);


As for the minimality of the assignment, what happens for the bipartite graphs? Take 2x2 example. The left A, B ops contribute to both right C, D ops. Assuming that the output edges are visited in the topological order (i.e. C comes before D in A and B lists), can we end up with A0, B1, C0, D2 assignment?

If so, for larger bipartite graphs, it seems the need for free_stream_ids can exceed the number of nodes.

A -- C \ / / \ B -- D

Fixed. The algorithm has been reworked and now the assignment is made as soon as a node is pushed to the queue. It's never re-pushed with a worse index.

Signed-off-by: Michal Zientkiewicz <[email protected]>

dali-automaton · 2024-09-02T13:56:00Z

CI MESSAGE: [18063682]: BUILD STARTED

dali-automaton · 2024-09-02T15:22:29Z

CI MESSAGE: [18063682]: BUILD PASSED

stiepan

There are cases when the stream_idx freed by the node is not the least avialable, which leads to non-minial assignment and changing the stream on a stright path.

Here are two examples:

TEST(Exec2Test, StreamAssignment_PerOperator_42) {
  ExecGraph eg;
  graph::OpGraph::Builder b;
  b.Add("a",
        SpecGPU()
        .AddOutput("a->d", "gpu")
        .AddOutput("a->e", "gpu")
        .AddOutput("a->f", "gpu")
        .AddOutput("a->c", "gpu"));
  b.Add("b",
        SpecGPU()
        .AddOutput("b->d", "gpu")
        .AddOutput("b->e", "gpu")
        .AddOutput("b->f", "gpu")
        .AddOutput("b->c", "gpu"));
  b.Add("c",
        SpecGPU()
        .AddInput("a->c", "gpu")
        .AddInput("b->c", "gpu")
        .AddOutput("c->g", "gpu"));
  b.Add("d",
        SpecGPU()
        .AddInput("a->d", "gpu")
        .AddInput("b->d", "gpu")
        .AddOutput("d->od", "gpu"));
  b.Add("e",
        SpecGPU()
        .AddInput("a->e", "gpu")
        .AddInput("b->e", "gpu")
        .AddOutput("e->oe", "gpu"));
  b.Add("f",
        SpecGPU()
        .AddInput("a->f", "gpu")
        .AddInput("b->f", "gpu")
        .AddOutput("f->of", "gpu"));
  b.Add("g",
        SpecGPU()
        .AddInput("c->g", "gpu")
        .AddOutput("g->og", "gpu"));
  b.AddOutput("d->od_gpu");
  b.AddOutput("e->oe_gpu");
  b.AddOutput("f->of_gpu");
  b.AddOutput("g->og_gpu");
  auto g = std::move(b).GetGraph(true);
  eg.Lower(g);

  StreamAssignment<StreamPolicy::PerOperator> assignment(eg);
  auto map = MakeNodeMap(eg);
  EXPECT_EQ(assignment[map["a"]], 0);
  EXPECT_EQ(assignment[map["b"]], 1);
  EXPECT_EQ(assignment[map["c"]], 4);
  EXPECT_EQ(assignment[map["d"]], 0);
  EXPECT_EQ(assignment[map["e"]], 1);
  EXPECT_EQ(assignment[map["f"]], 3);
  // it's the only child of c with stream idx 4, but gets 2
  EXPECT_EQ(assignment[map["g"]], 4);
}

TEST(Exec2Test, StreamAssignment_PerOperator_43) {
  ExecGraph eg;
  graph::OpGraph::Builder b;
  b.Add("a",
        SpecGPU()
        .AddOutput("a->c", "gpu")
        .AddOutput("a->d", "gpu")
        .AddOutput("a->e", "gpu")
        .AddOutput("a->f", "gpu"));
  b.Add("b",
        SpecGPU()
        .AddOutput("b->c", "gpu")
        .AddOutput("b->d", "gpu")
        .AddOutput("b->e", "gpu")
        .AddOutput("b->f", "gpu"));
  b.Add("c",
        SpecGPU()
        .AddInput("a->c", "gpu")
        .AddInput("b->c", "gpu")
        .AddOutput("c->oc", "gpu"));
  b.Add("d",
        SpecGPU()
        .AddInput("a->d", "gpu")
        .AddInput("b->d", "gpu")
        .AddOutput("d->od", "gpu"));
  b.Add("e",
        SpecGPU()
        .AddInput("a->e", "gpu")
        .AddInput("b->e", "gpu")
        .AddOutput("e->g", "gpu"));
  b.Add("f",
        SpecGPU()
        .AddInput("a->f", "gpu")
        .AddInput("b->f", "gpu")
        .AddOutput("f->h", "gpu"));
  b.Add("g",
        SpecGPU()
        .AddInput("e->g", "gpu")
        .AddOutput("g->og", "gpu"));
  b.Add("h",
        SpecGPU()
        .AddInput("f->h", "gpu")
        .AddOutput("h->oh", "gpu"));
  b.AddOutput("c->oc_gpu");
  b.AddOutput("d->od_gpu");
  b.AddOutput("g->og_gpu");
  b.AddOutput("h->oh_gpu");
  auto g = std::move(b).GetGraph(true);
  eg.Lower(g);

  StreamAssignment<StreamPolicy::PerOperator> assignment(eg);
  auto map = MakeNodeMap(eg);
  EXPECT_EQ(assignment[map["a"]], 0);
  EXPECT_EQ(assignment[map["b"]], 1);
  EXPECT_EQ(assignment[map["c"]], 0);
  EXPECT_EQ(assignment[map["d"]], 1);
  EXPECT_EQ(assignment[map["e"]], 3);
  EXPECT_EQ(assignment[map["f"]], 4);
  // e gets 2 instead of 3
  EXPECT_EQ(assignment[map["g"]], 3); // it's the only child of e
  EXPECT_EQ(assignment[map["h"]], 4);
}

Signed-off-by: Michal Zientkiewicz <[email protected]>

dali-automaton · 2024-09-04T07:13:33Z

CI MESSAGE: [18115353]: BUILD STARTED

mzient · 2024-09-04T08:23:17Z

There are still some problems with per-operator assignment. I have some promising ideas, but I've removed per-operator assignment from this PR and will do it as a follow-up.

dali-automaton · 2024-09-04T13:39:31Z

CI MESSAGE: [18115353]: BUILD PASSED

mzient force-pushed the stream_assignment branch from 4b35a70 to 290128c Compare August 8, 2024 18:07

dali-automaton assigned szalpal, awolant and stiepan Aug 9, 2024

mzient force-pushed the stream_assignment branch from 290128c to a28d72d Compare August 9, 2024 16:41

szalpal removed their assignment Aug 9, 2024

mzient added 2 commits September 2, 2024 12:04

Add ExecNode stream assignment algorithms and tests.

538eb3b

Signed-off-by: Michał Zientkiewicz <[email protected]>

Add workarounds for the temporary absence of exec2.h.

6aebb84

Signed-off-by: Michał Zientkiewicz <[email protected]>

mzient force-pushed the stream_assignment branch from a28d72d to 6aebb84 Compare September 2, 2024 10:04

Return stream ids to free pool when skipping.

01e9a1c

Signed-off-by: Michal Zientkiewicz <[email protected]>

Add tests for simple assignment policies. Improve comments.

ba0fb15

Signed-off-by: Michal Zientkiewicz <[email protected]>

stiepan reviewed Sep 2, 2024

View reviewed changes

mzient added 2 commits September 2, 2024 15:11

Minimize assignment with many-to-many graphs.

c199977

Signed-off-by: Michal Zientkiewicz <[email protected]>

Improve robustness.

771d840

Signed-off-by: Michal Zientkiewicz <[email protected]>

stiepan reviewed Sep 3, 2024

View reviewed changes

Remove PerOperator policy.

bd304b7

Signed-off-by: Michal Zientkiewicz <[email protected]>

awolant approved these changes Sep 4, 2024

View reviewed changes

stiepan approved these changes Sep 4, 2024

View reviewed changes

NVIDIA deleted a comment from dali-automaton Sep 4, 2024

mzient merged commit 22304f4 into NVIDIA:main Sep 4, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Executor 2.0: Stream assignment #5602

Executor 2.0: Stream assignment #5602

mzient commented Aug 8, 2024 •

edited

Loading

dali-automaton commented Sep 2, 2024

dali-automaton commented Sep 2, 2024

stiepan Sep 2, 2024

mzient Sep 3, 2024

stiepan Sep 2, 2024

mzient Sep 2, 2024

dali-automaton commented Sep 2, 2024

dali-automaton commented Sep 2, 2024

stiepan left a comment

dali-automaton commented Sep 4, 2024

mzient commented Sep 4, 2024 •

edited

Loading

dali-automaton commented Sep 4, 2024

	// This will be true for nodes which has no outputs or which doesn't contribute to any
	// This will be true for a node which has no outputs or which doesn't contribute to any

Executor 2.0: Stream assignment #5602

Executor 2.0: Stream assignment #5602

Conversation

mzient commented Aug 8, 2024 • edited Loading

Category:

Description:

Stream assignment policies

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

dali-automaton commented Sep 2, 2024

dali-automaton commented Sep 2, 2024

stiepan Sep 2, 2024

Choose a reason for hiding this comment

mzient Sep 3, 2024

Choose a reason for hiding this comment

stiepan Sep 2, 2024

Choose a reason for hiding this comment

mzient Sep 2, 2024

Choose a reason for hiding this comment

dali-automaton commented Sep 2, 2024

dali-automaton commented Sep 2, 2024

stiepan left a comment

Choose a reason for hiding this comment

dali-automaton commented Sep 4, 2024

mzient commented Sep 4, 2024 • edited Loading

dali-automaton commented Sep 4, 2024

mzient commented Aug 8, 2024 •

edited

Loading

mzient commented Sep 4, 2024 •

edited

Loading