-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize TensorList resizing. #5638
Conversation
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michał Zientkiewicz <[email protected]>
CI MESSAGE: [18510898]: BUILD STARTED |
CI MESSAGE: [18510898]: BUILD PASSED |
template <typename Backend> | ||
void TensorList<Backend>::recreate_views() { | ||
// precondition: type, shape are configured | ||
uint8_t *sample_ptr = static_cast<uint8_t *>(contiguous_buffer_.raw_mutable_data()); | ||
int64_t num_samples = shape().num_samples(); | ||
auto &data_ptr = contiguous_buffer_.get_data_ptr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoisting this line was perhaps the biggest saving here.
for (int64_t i = 0; i < num_samples; i++) { | ||
// or any other way | ||
auto tensor_size = shape().tensor_size(i); | ||
|
||
std::shared_ptr<void> sample_alias(contiguous_buffer_.get_data_ptr(), sample_ptr); | ||
tensors_[i].ShareData(sample_alias, tensor_size * type_info().size(), is_pinned(), shape()[i], | ||
tensors_[i].ShareData(std::shared_ptr<void>(data_ptr, sample_ptr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an intermediate variable and moving it was noticeably slower (but still noticeably faster than passing by const-ref and copying).
Category:
Refactoring (Redesign of existing code that doesn't affect functionality)
Description:
This change optimizes the performance of
TensorList::Resize
shared_ptr
in ShareData is now passed by value, allowing move semantics and reducing the number of atomic operationsAdditional information:
Many of the changes were tuned experimentally. Don't hesitate to ask if you see something not obvious or outright weird.
Affected modules and functionalities:
Buffer, Tensor, TensorList
Key points relevant for the review:
Tests:
No new functionality or functional changes - all existing tests apply
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A