-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU->CPU transfers #5593
Conversation
// Creating the graph | ||
|
||
for (auto& name_op_spec : op_specs_) { | ||
string& inst_name = name_op_spec.instance_name; | ||
OpSpec op_spec = name_op_spec.spec; | ||
PrepareOpSpec(&op_spec, name_op_spec.logical_id); | ||
try { | ||
graph_builder_.Add(inst_name, op_spec); | ||
} catch (...) { | ||
PropagateError({std::current_exception(), | ||
"Critical error when building pipeline:\n" + GetErrorContextMessage(op_spec), | ||
"\nCurrent pipeline object is no longer valid."}); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part has been moved after the outputs because when processing outputs we add MakeContiguous operators.
016ef1e
to
33fca28
Compare
@@ -91,6 +94,20 @@ def gpu(self) -> DataNode: | |||
return transferred_node | |||
return DataNode(self.name, "gpu", self.source) | |||
|
|||
def cpu(self) -> DataNode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is copied from .gpu(), see above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the code is identical it can be abstracted out to a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like:
def cpu(self):
return self._to_backend("cpu")
def gpu(self):
return self._to_backend("gpu")
def _to_backend(self, backend) -> DataNode:
...
?
It could be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would be great! Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support above^^
And maybe _to_backend
should not be internal, maybe people would like to use it directly (transfer by parameter and not the function name).
Signed-off-by: Michał Zientkiewicz <[email protected]>
Signed-off-by: Michał Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
33fca28
to
1826e37
Compare
…CPU `shapes` taking GPU input. Signed-off-by: Michał Zientkiewicz <[email protected]>
Signed-off-by: Michał Zientkiewicz <[email protected]>
CI MESSAGE: [18299035]: BUILD STARTED |
Signed-off-by: Michał Zientkiewicz <[email protected]>
CI MESSAGE: [18299035]: BUILD FAILED |
CI MESSAGE: [18301349]: BUILD STARTED |
CI MESSAGE: [18301349]: BUILD PASSED |
* If `separated_execution == 0`, this value is ignored | ||
* @param enable_memory_stats Enable memory stats. | ||
*/ | ||
DLL_PUBLIC void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added exec_flags
so that we can (hopefully) get away without adding even more parameters when we add more flavors of the executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non critical comments, address them if needed.
@@ -91,6 +94,20 @@ def gpu(self) -> DataNode: | |||
return transferred_node | |||
return DataNode(self.name, "gpu", self.source) | |||
|
|||
def cpu(self) -> DataNode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the code is identical it can be abstracted out to a function
pipe = pdef() | ||
pipe.build() | ||
for i in range(10): | ||
gpu, cpu = pipe.run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can add check for the TL backend here as well. check_batch
can handle anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
dali/test/python/test_pipeline.py
Outdated
pipe = pdef() | ||
pipe.build() | ||
for i in range(10): | ||
peek, gpu, cpu = pipe.run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
* @param enable_memory_stats Enable memory stats. | ||
*/ | ||
DLL_PUBLIC void | ||
daliCreatePipeline3(daliPipelineHandle *pipe_handle, const char *serialized_pipeline, int length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you add a test to c_api_test.cc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do (perhaps in a follow-up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a backend check in the Python test, other things are more questions than suggestions.
Signed-off-by: Michal Zientkiewicz <[email protected]>
CI MESSAGE: [18333257]: BUILD STARTED |
CI MESSAGE: [18333257]: BUILD PASSED |
Category:
New feature
Refactoring
Description:
GPU->CPU transfer is made possible via
.cpu()
function inDataNode
.Some refactoring of pipeline class.
The checks have been removed from Pipeline class. The old executor still raises an error when GPU->CPU transfer occurs.
The check for GPU arguments to CPU ops have been removed from Python front-end.
TODO: Extend InputDevice in Schema and use it for python-side checks.
Additional information:
Affected modules and functionalities:
Pipeline
Python front-end
Copy operator
Key points relevant for the review:
Some old tests were removed or reduced to allow for the new capability.
New tests were added which test:
.cpu()
for transfers (including with conditionals)fn.shapes()
with CPU backend and GPU inputChecklist
Documentation
DALI team only
Requirements
REQ IDs: EXE.06
JIRA TASK: DALI-4030