Optimize TensorList resizing. #5638

mzient · 2024-09-18T15:06:57Z

Category:

Refactoring (Redesign of existing code that doesn't affect functionality)

Description:

This change optimizes the performance of TensorList::Resize

simple inline functions are moved to the header
shared_ptr in ShareData is now passed by value, allowing move semantics and reducing the number of atomic operations
some code motion to improve inlining (e.g. wrapping frequent calls to DLL_PUBLIC functions into a trampoline function)

Additional information:

Many of the changes were tuned experimentally. Don't hesitate to ask if you see something not obvious or outright weird.

Affected modules and functionalities:

Buffer, Tensor, TensorList

Key points relevant for the review:

Tests:

No new functionality or functional changes - all existing tests apply

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Michal Zientkiewicz <[email protected]>

Signed-off-by: Michał Zientkiewicz <[email protected]>

dali-automaton · 2024-09-18T15:10:16Z

CI MESSAGE: [18510898]: BUILD STARTED

dali/pipeline/data/buffer.h

dali-automaton · 2024-09-18T17:21:59Z

CI MESSAGE: [18510898]: BUILD PASSED

mzient · 2024-09-18T17:50:53Z

dali/pipeline/data/tensor_list.cc

 template <typename Backend>
 void TensorList<Backend>::recreate_views() {
  // precondition: type, shape are configured
  uint8_t *sample_ptr = static_cast<uint8_t *>(contiguous_buffer_.raw_mutable_data());
  int64_t num_samples = shape().num_samples();
+  auto &data_ptr = contiguous_buffer_.get_data_ptr();


Hoisting this line was perhaps the biggest saving here.

mzient · 2024-09-18T17:51:20Z

dali/pipeline/data/tensor_list.cc

  for (int64_t i = 0; i < num_samples; i++) {
    // or any other way
    auto tensor_size = shape().tensor_size(i);

-    std::shared_ptr<void> sample_alias(contiguous_buffer_.get_data_ptr(), sample_ptr);
-    tensors_[i].ShareData(sample_alias, tensor_size * type_info().size(), is_pinned(), shape()[i],
+    tensors_[i].ShareData(std::shared_ptr<void>(data_ptr, sample_ptr),


Having an intermediate variable and moving it was noticeably slower (but still noticeably faster than passing by const-ref and copying).

mzient and others added 2 commits September 18, 2024 17:02

Optimize TensorList resizing.

7610ac0

Signed-off-by: Michal Zientkiewicz <[email protected]>

Reduce the PR size.

df16976

Signed-off-by: Michał Zientkiewicz <[email protected]>

JanuszL reviewed Sep 18, 2024

View reviewed changes

dali/pipeline/data/buffer.h Show resolved Hide resolved

JanuszL approved these changes Sep 18, 2024

View reviewed changes

JanuszL self-assigned this Sep 18, 2024

mzient commented Sep 18, 2024

View reviewed changes

dali-automaton assigned awolant Sep 19, 2024

awolant approved these changes Sep 19, 2024

View reviewed changes

mzient merged commit f34a227 into NVIDIA:main Sep 19, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize TensorList resizing. #5638

Optimize TensorList resizing. #5638

mzient commented Sep 18, 2024 •

edited

Loading

dali-automaton commented Sep 18, 2024

dali-automaton commented Sep 18, 2024

mzient Sep 18, 2024

mzient Sep 18, 2024

Optimize TensorList resizing. #5638

Optimize TensorList resizing. #5638

Conversation

mzient commented Sep 18, 2024 • edited Loading

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

dali-automaton commented Sep 18, 2024

dali-automaton commented Sep 18, 2024

mzient Sep 18, 2024

Choose a reason for hiding this comment

mzient Sep 18, 2024

Choose a reason for hiding this comment

mzient commented Sep 18, 2024 •

edited

Loading