Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor 2.0: Stream assignment #5602

Merged
merged 7 commits into from
Sep 4, 2024
Merged

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Aug 8, 2024

Category:

New feature (non-breaking change which adds functionality)

Description:

Executor 2.0 doesn't have "stages" and streams are assigned according to stream assignment policy.

Stream assignment policies

Executor2 assigns streams to operator nodes based on the StreamPolicy configuration parameter. There are three stream policies:

  1. Single - there's just one stream used by all operators which need one.
  2. PerBackend - there's a distinct stream per backend (GPU, Mixed)
  3. PerOperator - operators which can execute independently are assigned distinct streams - but operators with sequential dependency use the same stream - NOT IMPLEMENTED IN THIS PR

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-4030

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18059205]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18060037]: BUILD STARTED


queue_.pop();
auto *node = sorted_nodes_[idx];
// This will be true for nodes which has no outputs or which doesn't contribute to any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This will be true for nodes which has no outputs or which doesn't contribute to any
// This will be true for a node which has no outputs or which doesn't contribute to any

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went the other way.

void Assign(ExecGraph &graph) {
// pre-fill the id pool with sequential numbers
for (int i = 0, n = graph.Nodes().size(); i < n; i++) {
free_stream_ids_.insert(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the minimality of the assignment, what happens for the bipartite graphs? Take 2x2 example. The left A, B ops contribute to both right C, D ops. Assuming that the output edges are visited in the topological order (i.e. C comes before D in A and B lists), can we end up with A0, B1, C0, D2 assignment?

If so, for larger bipartite graphs, it seems the need for free_stream_ids can exceed the number of nodes.

A -- C
 \  /
 /  \
B -- D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The algorithm has been reworked and now the assignment is made as soon as a node is pushed to the queue. It's never re-pushed with a worse index.

Signed-off-by: Michal Zientkiewicz <[email protected]>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18063682]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18063682]: BUILD PASSED

Copy link
Member

@stiepan stiepan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are cases when the stream_idx freed by the node is not the least avialable, which leads to non-minial assignment and changing the stream on a stright path.

Here are two examples:

TEST(Exec2Test, StreamAssignment_PerOperator_42) {
  ExecGraph eg;
  graph::OpGraph::Builder b;
  b.Add("a",
        SpecGPU()
        .AddOutput("a->d", "gpu")
        .AddOutput("a->e", "gpu")
        .AddOutput("a->f", "gpu")
        .AddOutput("a->c", "gpu"));
  b.Add("b",
        SpecGPU()
        .AddOutput("b->d", "gpu")
        .AddOutput("b->e", "gpu")
        .AddOutput("b->f", "gpu")
        .AddOutput("b->c", "gpu"));
  b.Add("c",
        SpecGPU()
        .AddInput("a->c", "gpu")
        .AddInput("b->c", "gpu")
        .AddOutput("c->g", "gpu"));
  b.Add("d",
        SpecGPU()
        .AddInput("a->d", "gpu")
        .AddInput("b->d", "gpu")
        .AddOutput("d->od", "gpu"));
  b.Add("e",
        SpecGPU()
        .AddInput("a->e", "gpu")
        .AddInput("b->e", "gpu")
        .AddOutput("e->oe", "gpu"));
  b.Add("f",
        SpecGPU()
        .AddInput("a->f", "gpu")
        .AddInput("b->f", "gpu")
        .AddOutput("f->of", "gpu"));
  b.Add("g",
        SpecGPU()
        .AddInput("c->g", "gpu")
        .AddOutput("g->og", "gpu"));
  b.AddOutput("d->od_gpu");
  b.AddOutput("e->oe_gpu");
  b.AddOutput("f->of_gpu");
  b.AddOutput("g->og_gpu");
  auto g = std::move(b).GetGraph(true);
  eg.Lower(g);

  StreamAssignment<StreamPolicy::PerOperator> assignment(eg);
  auto map = MakeNodeMap(eg);
  EXPECT_EQ(assignment[map["a"]], 0);
  EXPECT_EQ(assignment[map["b"]], 1);
  EXPECT_EQ(assignment[map["c"]], 4);
  EXPECT_EQ(assignment[map["d"]], 0);
  EXPECT_EQ(assignment[map["e"]], 1);
  EXPECT_EQ(assignment[map["f"]], 3);
  // it's the only child of c with stream idx 4, but gets 2
  EXPECT_EQ(assignment[map["g"]], 4);
}

TEST(Exec2Test, StreamAssignment_PerOperator_43) {
  ExecGraph eg;
  graph::OpGraph::Builder b;
  b.Add("a",
        SpecGPU()
        .AddOutput("a->c", "gpu")
        .AddOutput("a->d", "gpu")
        .AddOutput("a->e", "gpu")
        .AddOutput("a->f", "gpu"));
  b.Add("b",
        SpecGPU()
        .AddOutput("b->c", "gpu")
        .AddOutput("b->d", "gpu")
        .AddOutput("b->e", "gpu")
        .AddOutput("b->f", "gpu"));
  b.Add("c",
        SpecGPU()
        .AddInput("a->c", "gpu")
        .AddInput("b->c", "gpu")
        .AddOutput("c->oc", "gpu"));
  b.Add("d",
        SpecGPU()
        .AddInput("a->d", "gpu")
        .AddInput("b->d", "gpu")
        .AddOutput("d->od", "gpu"));
  b.Add("e",
        SpecGPU()
        .AddInput("a->e", "gpu")
        .AddInput("b->e", "gpu")
        .AddOutput("e->g", "gpu"));
  b.Add("f",
        SpecGPU()
        .AddInput("a->f", "gpu")
        .AddInput("b->f", "gpu")
        .AddOutput("f->h", "gpu"));
  b.Add("g",
        SpecGPU()
        .AddInput("e->g", "gpu")
        .AddOutput("g->og", "gpu"));
  b.Add("h",
        SpecGPU()
        .AddInput("f->h", "gpu")
        .AddOutput("h->oh", "gpu"));
  b.AddOutput("c->oc_gpu");
  b.AddOutput("d->od_gpu");
  b.AddOutput("g->og_gpu");
  b.AddOutput("h->oh_gpu");
  auto g = std::move(b).GetGraph(true);
  eg.Lower(g);

  StreamAssignment<StreamPolicy::PerOperator> assignment(eg);
  auto map = MakeNodeMap(eg);
  EXPECT_EQ(assignment[map["a"]], 0);
  EXPECT_EQ(assignment[map["b"]], 1);
  EXPECT_EQ(assignment[map["c"]], 0);
  EXPECT_EQ(assignment[map["d"]], 1);
  EXPECT_EQ(assignment[map["e"]], 3);
  EXPECT_EQ(assignment[map["f"]], 4);
  // e gets 2 instead of 3
  EXPECT_EQ(assignment[map["g"]], 3); // it's the only child of e
  EXPECT_EQ(assignment[map["h"]], 4);
}

Signed-off-by: Michal Zientkiewicz <[email protected]>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18115353]: BUILD STARTED

@mzient
Copy link
Contributor Author

mzient commented Sep 4, 2024

There are still some problems with per-operator assignment. I have some promising ideas, but I've removed per-operator assignment from this PR and will do it as a follow-up.

@NVIDIA NVIDIA deleted a comment from dali-automaton Sep 4, 2024
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [18115353]: BUILD PASSED

@mzient mzient merged commit 22304f4 into NVIDIA:main Sep 4, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants