Enable GPU->CPU transfers #5593

mzient · 2024-08-02T20:03:00Z

Category:

New feature
Refactoring

Description:

GPU->CPU transfer is made possible via .cpu() function in DataNode.
Some refactoring of pipeline class.
The checks have been removed from Pipeline class. The old executor still raises an error when GPU->CPU transfer occurs.
The check for GPU arguments to CPU ops have been removed from Python front-end.

TODO: Extend InputDevice in Schema and use it for python-side checks.

Additional information:

Affected modules and functionalities:

Pipeline
Python front-end
Copy operator

Key points relevant for the review:

Some old tests were removed or reduced to allow for the new capability.
New tests were added which test:
.cpu() for transfers (including with conditionals)
fn.shapes() with CPU backend and GPU input

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: EXE.06

JIRA TASK: DALI-4030

mzient · 2024-08-02T20:04:48Z

dali/pipeline/pipeline.cc

-  // Creating the graph
-
-  for (auto& name_op_spec : op_specs_) {
-    string& inst_name = name_op_spec.instance_name;
-    OpSpec op_spec = name_op_spec.spec;
-    PrepareOpSpec(&op_spec, name_op_spec.logical_id);
-    try {
-      graph_builder_.Add(inst_name, op_spec);
-    } catch (...) {
-      PropagateError({std::current_exception(),
-                      "Critical error when building pipeline:\n" + GetErrorContextMessage(op_spec),
-                      "\nCurrent pipeline object is no longer valid."});
-    }
-  }
-


This part has been moved after the outputs because when processing outputs we add MakeContiguous operators.

mzient · 2024-09-09T08:11:45Z

dali/python/nvidia/dali/data_node.py

@@ -91,6 +94,20 @@ def gpu(self) -> DataNode:
            return transferred_node
        return DataNode(self.name, "gpu", self.source)

+    def cpu(self) -> DataNode:


The logic here is copied from .gpu(), see above.

If the code is identical it can be abstracted out to a function

Something like:

def cpu(self): return self._to_backend("cpu") def gpu(self): return self._to_backend("gpu") def _to_backend(self, backend) -> DataNode: ...

?
It could be done.

Yes, this would be great! Thanks

I support above^^
And maybe _to_backend should not be internal, maybe people would like to use it directly (transfer by parameter and not the function name).

Signed-off-by: Michał Zientkiewicz <[email protected]>

Signed-off-by: Michal Zientkiewicz <[email protected]>

…CPU `shapes` taking GPU input. Signed-off-by: Michał Zientkiewicz <[email protected]>

Signed-off-by: Michał Zientkiewicz <[email protected]>

dali-automaton · 2024-09-10T16:22:44Z

CI MESSAGE: [18299035]: BUILD STARTED

Signed-off-by: Michał Zientkiewicz <[email protected]>

dali-automaton · 2024-09-10T17:43:46Z

CI MESSAGE: [18299035]: BUILD FAILED

dali-automaton · 2024-09-10T17:45:46Z

CI MESSAGE: [18301349]: BUILD STARTED

dali-automaton · 2024-09-10T21:00:11Z

CI MESSAGE: [18301349]: BUILD PASSED

mzient · 2024-09-11T12:17:19Z

include/dali/c_api.h

+ *                                 If `separated_execution == 0`, this value is ignored
+ * @param enable_memory_stats Enable memory stats.
+ */
+DLL_PUBLIC void


I added exec_flags so that we can (hopefully) get away without adding even more parameters when we add more flavors of the executor.

mdabek-nvidia

Non critical comments, address them if needed.

dali/pipeline/executor/lowered_graph.cc

mdabek-nvidia · 2024-09-11T13:17:40Z

dali/python/nvidia/dali/data_node.py

@@ -91,6 +94,20 @@ def gpu(self) -> DataNode:
            return transferred_node
        return DataNode(self.name, "gpu", self.source)

+    def cpu(self) -> DataNode:


If the code is identical it can be abstracted out to a function

dali/pipeline/executor/executor_factory.cc

dali/pipeline/executor/lowered_graph.cc

dali/pipeline/operator/builtin/copy.cc

dali/python/nvidia/dali/_utils/eager_utils.py

JanuszL · 2024-09-11T14:49:16Z

dali/test/python/test_pipeline.py

+    pipe = pdef()
+    pipe.build()
+    for i in range(10):
+        gpu, cpu = pipe.run()


Maybe you can add check for the TL backend here as well. check_batch can handle anything.

Good point.

JanuszL · 2024-09-11T14:49:46Z

dali/test/python/test_pipeline.py

+    pipe = pdef()
+    pipe.build()
+    for i in range(10):
+        peek, gpu, cpu = pipe.run()


JanuszL · 2024-09-11T14:52:46Z

include/dali/c_api.h

+ * @param enable_memory_stats Enable memory stats.
+ */
+DLL_PUBLIC void
+daliCreatePipeline3(daliPipelineHandle *pipe_handle, const char *serialized_pipeline, int length,


Shouldn't you add a test to c_api_test.cc?

Will do (perhaps in a follow-up).

JanuszL

Please add a backend check in the Python test, other things are more questions than suggestions.

Signed-off-by: Michal Zientkiewicz <[email protected]>

dali-automaton · 2024-09-11T15:02:29Z

CI MESSAGE: [18333257]: BUILD STARTED

dali-automaton · 2024-09-11T17:00:07Z

CI MESSAGE: [18333257]: BUILD PASSED

mzient commented Aug 2, 2024

View reviewed changes

mzient force-pushed the enable_gpu2cpu branch 3 times, most recently from 016ef1e to 33fca28 Compare September 9, 2024 08:08

mzient commented Sep 9, 2024

View reviewed changes

mzient and others added 5 commits September 10, 2024 13:51

.cpu() works with experimental executor.

494a626

Signed-off-by: Michał Zientkiewicz <[email protected]>

Simplify copy.

831cc9f

Signed-off-by: Michał Zientkiewicz <[email protected]>

Fix a bug in DataNode.cpu() with conditionals.

3ff9ea9

Signed-off-by: Michal Zientkiewicz <[email protected]>

Add dynamic_execution parameter to the Pipeline constructor.

c15c009

Signed-off-by: Michal Zientkiewicz <[email protected]>

Add C API support.

1826e37

Signed-off-by: Michal Zientkiewicz <[email protected]>

mzient force-pushed the enable_gpu2cpu branch from 33fca28 to 1826e37 Compare September 10, 2024 11:53

mzient added 2 commits September 10, 2024 18:15

Unblock CPU operators taking GPU inputs. Adjust tests. Add test with …

5fccaa4

…CPU `shapes` taking GPU input. Signed-off-by: Michał Zientkiewicz <[email protected]>

Fix misplaced comment.

13f6e17

Signed-off-by: Michał Zientkiewicz <[email protected]>

mzient marked this pull request as ready for review September 10, 2024 16:21

Restore error when specifying GPU argument inputs. Fix shapes_gpu test.

9f239a4

Signed-off-by: Michał Zientkiewicz <[email protected]>

dali-automaton assigned awolant and mdabek-nvidia Sep 11, 2024

mzient commented Sep 11, 2024

View reviewed changes

mdabek-nvidia reviewed Sep 11, 2024

View reviewed changes

mdabek-nvidia approved these changes Sep 11, 2024

View reviewed changes