Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type #41373

Merged
merged 22 commits into from
Jun 10, 2024

Conversation

felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Apr 25, 2024

Rationale for this change

I want to instantiate this primitive operation in other scenarios (e.g. the optimized version of Take that handles chunked arrays) and extend the sub-classes of GatherCRTP with different member functions that re-use the WriteValue function generically (any fixed-width type and even bit-wide booleans).

When taking these improvements to Filter I will also re-use the "gather" concept and parameterize it by bitmaps/boolean-arrays instead of selection vectors (indices) like take does. So gather is not a "renaming of take" but rather a generalization of take and filter do in Arrow with different representations of what should be gathered from the values array.

What changes are included in this PR?

  • Introduce the Gather class helper to delegate fixed-width memory gathering: both static and dynamically sized (size known at compile time or size known at runtime)
  • Specialized Take implementation for values/indices without nulls
  • Fold the Boolean, Primitives, and Fixed-Width Binary implementation of Take into a single one
  • Skip validity bitmap allocation when inputs (values and indices) have no nulls

Are these changes tested?

  • Existing tests
  • New test assertions that check that Take guarantees null values are zeroed out

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 3, 2024
@felipecrv felipecrv requested a review from pitrou May 3, 2024 15:06
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 3, 2024
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical that you want to reuse this for Filter, unless you add Gather methods for batch selection. For Filter performance, it is essential to write out ranges of selected values at a time, not one value at a time. I don't know if that's what you have in mind.

Other than that, some assorted comments.

cpp/src/arrow/util/macros.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
@felipecrv
Copy link
Contributor Author

I'm skeptical that you want to reuse this for Filter, unless you add Gather methods for batch selection. For Filter performance, it is essential to write out ranges of selected values at a time, not one value at a time. I don't know if that's what you have in mind.

I want to expand the set of WriteValue implementations to support writing multiple values. Then add a version of Gather::Execute*() that takes boolean arrays (masks) instead of indices (selection vectors).

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels May 6, 2024
@felipecrv felipecrv requested a review from pitrou May 6, 2024 20:43
@felipecrv felipecrv force-pushed the gather_fixed branch 3 times, most recently from 3dd6c56 to 6d769d3 Compare May 15, 2024 01:36
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 17, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 18, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 4, 2024
@pitrou
Copy link
Member

pitrou commented Jun 5, 2024

Proof that this PR reduces binary size:

Nice! Can you also post numbers obtained with the size utility for completeness?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 5, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 5, 2024
@felipecrv
Copy link
Contributor Author

felipecrv commented Jun 6, 2024

Proof that this PR reduces binary size:

Nice! Can you also post numbers obtained with the size utility for completeness?

Looking at both vector_selection_take_internal.cc.o and vector_selection_internal.cc.o I'm net-adding 1.45 KBytes.

bloaty -d symbols -C full -n 0 \
HEAD-vector_selection_take_internal.cc.o HEAD-vector_selection_internal.cc.o -- \
MAIN-vector_selection_take_internal.cc.o MAIN-vector_selection_internal.cc.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +32.8Ki  [NEW] +32.7Ki    arrow::compute::internal::FixedWidthTakeExec(arrow::compute::KernelContext*, arrow::compute::ExecSpan const&, arrow::compute::ExecResult*)
  [NEW]    +204  [NEW]    +168    GCC_except_table272
  [NEW]    +189  [NEW]    +104    arrow::Status arrow::Status::NotImplemented<char const (&) [38], arrow::DataType const&>(char const (&&&) [38], arrow::DataType const&&&)
Details

  +183%    +168  +300%    +168    GCC_except_table170
   +43%    +164   +47%    +164    GCC_except_table20
  [NEW]    +156  [NEW]    +120    GCC_except_table302
  [NEW]    +146  [NEW]     +76    GCC_except_table99
   +40%    +109   +71%    +144    GCC_except_table24
   +72%    +143   +66%   gg +108    GCC_except_table13
  [NEW]    +126  [NEW]     +56    GCC_except_table60
  [NEW]    +108  [NEW]     +72    GCC_except_table138
  +169%    +108  +257%     +72    GCC_except_table172
  [NEW]    +108  [NEW]     +72    GCC_except_table225
  [NEW]    +107  [NEW]     +72    GCC_except_table64
  +195%    +107  +360%     +72    GCC_except_table87
  [NEW]     +96  [NEW]     +60    GCC_except_table111
  +160%     +96  +250%     +60    GCC_except_table151
   +25%     +95   +18%     +60    GCC_except_table17
  [NEW]     +95  [NEW]     +60    GCC_except_table72
  [NEW]     +95  [NEW]     +60    GCC_except_table81
  +116%     +88  +220%     +88    GCC_except_table192
  [NEW]     +84  [NEW]     +48    GCC_except_table188
  [NEW]     +76  [NEW]     +40    GCC_except_table107
   +96%     +68  +189%     +68    GCC_except_table10
  [NEW]     +68  [NEW]     +32    GCC_except_table102
  [NEW]     +68  [NEW]     +32    GCC_except_table114
  [NEW]     +68  [NEW]     +32    GCC_except_table144
  [NEW]     +68  [NEW]     +32    GCC_except_table197
  [NEW]     +68  [NEW]     +32    GCC_except_table223
  [NEW]     +67  [NEW]     +32    GCC_except_table43
  [NEW]     +67  [NEW]     +32    GCC_except_table53
  [NEW]     +67  [NEW]     +32    GCC_except_table90
  [NEW]     +64  [NEW]     +28    GCC_except_table164
  [NEW]     +64  [NEW]     +28    GCC_except_table199
  [NEW]     +64  [NEW]     +28    GCC_except_table207
  [NEW]     +64  [NEW]     +28    GCC_except_table213
  [NEW]     +63  [NEW]     +28    GCC_except_table28
  [NEW]     +60  [NEW]     +24    GCC_except_table128
  [NEW]     +60  [NEW]     +24    GCC_except_table155
  [NEW]     +60  [NEW]     +24    GCC_except_table298
  [NEW]     +59  [NEW]     +24    GCC_except_table36
  [NEW]     +56  [NEW]     +20    GCC_except_table182
  [NEW]     +56  [NEW]     +20    GCC_except_table190
  [NEW]     +56  [NEW]     +20    GCC_except_table209
  [NEW]     +52  [NEW]     +16    GCC_except_table133
  [NEW]     +52  [NEW]     +16    GCC_except_table134
  [NEW]     +52  [NEW]     +16    GCC_except_table162
  [NEW]     +52  [NEW]     +16    GCC_except_table168
  [NEW]     +52  [NEW]     +16    GCC_except_table179
  [NEW]     +52  [NEW]     +16    GCC_except_table194
  +100%     +52  +100%     +16    GCC_except_table202
  [NEW]     +52  [NEW]     +16    GCC_except_table218
  [NEW]     +52  [NEW]     +16    GCC_except_table221
  [NEW]     +52  [NEW]     +16    GCC_except_table277
  [NEW]     +52  [NEW]     +16    GCC_except_table279
  [NEW]     +51  [NEW]     +16    GCC_except_table38
  [NEW]     +50  [NEW]     +16    GCC_except_table4
   +92%     +48  +300%     +48    GCC_except_table132
  [NEW]     +47  [NEW]     +12    GCC_except_table54
  [NEW]     +42  [NEW]     +16    lCPI223_1
  [NEW]     +40  [NEW]     +16    lJTI2_8
  [NEW]     +40  [NEW]     +16    lJTI2_9
   +25%     +32   +53%     +32    GCC_except_table84
  +0.4%     +32  +0.4%     +32    ltmp9
   +30%     +31  -5.9%      -4    GCC_except_table85
  [NEW]     +31  [NEW]      +5    lJTI170_0
  [NEW]     +30  [NEW]      +4    lJTI170_1
   +44%     +28  +100%     +28    GCC_except_table171
  -6.4%      -7   +70%     +28    GCC_except_table35
   +32%     +24   +60%     +24    GCC_except_table177
   +46%     +24  +150%     +24    GCC_except_table178
   +38%     +20  +125%     +20    GCC_except_table217
   +26%     +15  +100%     +16    GCC_except_table100
   +19%     +12   +43%     +12    GCC_except_table206
   +19%     +12   +43%     +12    GCC_except_table212
   +14%      +8   +40%      +8    GCC_except_table181
  +5.6%      +4   +11%      +4    GCC_except_table216
  +2.9%      +2  +7.4%      +2    typeinfo name for arrow::StructArray
  +1.2%      +2  +2.6%      +2    typeinfo name for std::__1::__shared_ptr_emplace<arrow::ChunkedArray, std::__1::allocator<arrow::ChunkedArray> >
  -6.8%      -4 -16.7%      -4    GCC_except_table83
 -13.3%      -8 -33.3%      -8    GCC_except_table149
 -17.9%     -12 -37.5%     -12    GCC_except_table41
 -20.7%     -12 -50.0%     -12    GCC_except_table8
 -23.9%     -16 -50.0%     -16    GCC_except_table52
 -26.3%     -20 -50.0%     -20    GCC_except_table215
  [DEL]     -30  [DEL]      -4    lJTI169_1
 -27.3%     -30 -46.9%     -30    lJTI2_0
  [DEL]     -31  [DEL]      -5    lJTI169_0
  +2.9%      +3 -47.1%     -32    GCC_except_table34
  [DEL]     -41  [DEL]     -16    lJTI25_0
  [DEL]     -42  [DEL]     -16    lCPI222_1
 -40.7%     -44 -61.1%     -44    GCC_except_table173
  [DEL]     -46  [DEL]     -12    GCC_except_table7
  [DEL]     -47  [DEL]     -12    GCC_except_table55
  -0.6%     -48  -0.6%     -48    arrow::compute::internal::FSLTakeExec(arrow::compute::KernelContext*, arrow::compute::ExecSpan const&, arrow::compute::ExecResult*)
  [DEL]     -51  [DEL]     -16    GCC_except_table37
  [DEL]     -51  [DEL]     -16    GCC_except_table51
  [DEL]     -51  [DEL]     -16    GCC_except_table61
  [DEL]     -52  [DEL]     -16    GCC_except_table135
  [DEL]     -52  [DEL]     -16    GCC_except_table148
  [DEL]     -52  [DEL]     -16    GCC_except_table161
  [DEL]     -52  [DEL]     -16    GCC_except_table203
  [DEL]     -52  [DEL]     -16    GCC_except_table285
  [DEL]     -52  [DEL]     -16    GCC_except_table287
  [DEL]     -52  [DEL]     -16    GCC_except_table288
  [DEL]     -55  [DEL]     -20    GCC_except_table33
  [DEL]     -55  [DEL]     -20    GCC_except_table40
  [DEL]     -56  [DEL]     -20    GCC_except_table208
  [DEL]     -56  [DEL]     -20    GCC_except_table214
 -48.3%     -58 -46.2%     -24    GCC_except_table2
  [DEL]     -60  [DEL]     -24    GCC_except_table127
  [DEL]     -60  [DEL]     -24    GCC_except_table306
  [DEL]     -63  [DEL]     -28    GCC_except_table29
  [DEL]     -64  [DEL]     -28    GCC_except_table163
  [DEL]     -64  [DEL]     -28    GCC_except_table180
  [DEL]     -68  [DEL]     -32    GCC_except_table113
  [DEL]     -68  [DEL]     -32    GCC_except_table145
 -22.1%     -72 -28.1%     -72    GCC_except_table11
  [DEL]     -75  [DEL]     -40    GCC_except_table59
  [DEL]     -76  [DEL]     -40    GCC_except_table108
  [DEL]     -76  [DEL]     -40    GCC_except_table205
  [DEL]     -76  [DEL]     -40    GCC_except_table211
 -61.3%     -92 -79.3%     -92    GCC_except_table9
 -20.2%     -95 -15.0%     -60    GCC_except_table16
  [DEL]     -95  [DEL]     -60    GCC_except_table71
  [DEL]     -95  [DEL]     -60    GCC_except_table82
  [DEL]     -95  [DEL]     -60    GCC_except_table98
  [DEL]     -96  [DEL]     -60    GCC_except_table110
 -61.5%     -96 -71.4%     -60    GCC_except_table152
 -64.3%     -99 -76.2%     -64    GCC_except_table86
  [DEL]    -100  [DEL]     -64    GCC_except_table131
  [DEL]    -100  [DEL]     -64    GCC_except_table174
  [DEL]    -103  [DEL]     -24    typeinfo for arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl
  [DEL]    -107  [DEL]     -72    GCC_except_table65
  [DEL]    -108  [DEL]     -72    GCC_except_table139
  [DEL]    -108  [DEL]     -72    GCC_except_table226
  [DEL]    -113  [DEL]     -28    arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl::Finish()
  [DEL]    -120  [DEL]     -48    GCC_except_table222
  [DEL]    -127  [DEL]     -48    vtable for arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl
  [DEL]    -132  [DEL]     -60    GCC_except_table198
 -24.1%    -132 -25.8%    -132    GCC_except_table21
  [DEL]    -133  [DEL]      -8    arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>::Init()
  [DEL]    -134  [DEL]     -64    GCC_except_table89
  [DEL]    -136  [DEL]     -64    GCC_except_table101
 -26.2%    -136 -28.1%    -136    GCC_except_table18
  [DEL]    -137  [DEL]     -16    typeinfo for arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>
  [DEL]    -137  [DEL]     -58    typeinfo name for arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl
  [DEL]    -140  [DEL]     -68    GCC_except_table189
 -64.8%    -140 -72.2%    -104    GCC_except_table193
  [DEL]    -149  [DEL]     -96    arrow::FixedSizeBinaryArray::~FixedSizeBinaryArray()
 -74.5%    -152 -90.5%    -152    GCC_except_table280
  [DEL]    -156  [DEL]    -120    GCC_except_table310
  [DEL]    -169  [DEL]     -48    vtable for arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>
 -50.1%    -208 -54.7%    -208    GCC_except_table22
  [DEL]    -221  [DEL]    -100    typeinfo name for arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>
  [DEL]    -252  [DEL]      -8    arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>::~Selection()
  -2.1%    -256  -2.1%    -256    ltmp5
  [DEL]    -312  [DEL]    -240    GCC_except_table169
 -82.5%    -316 -90.8%    -316    GCC_except_table25

  -8.9%    -352  -9.2%    -352    arrow::compute::internal::PopulateTakeKernels(std::__1::vector<arrow::compute::internal::SelectionKernelData, std::__1::allocator<arrow::compute::internal::SelectionKernelData> >*)
  [DEL]    -464  [DEL]    -304    arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl::~FSBSelectionImpl()
  -0.8%    -677  [ = ]       0    [Unmapped]
  [DEL] -5.83Ki  [DEL] -5.72Ki    arrow::compute::internal::FSBTakeExec(arrow::compute::KernelContext*, arrow::compute::ExecSpan const&, arrow::compute::ExecResult*)
  [DEL] -23.9Ki  [DEL] -23.8Ki    arrow::compute::internal::PrimitiveTakeExec(arrow::compute::KernelContext*, arrow::compute::ExecSpan const&, arrow::compute::ExecResult*)
  -0.1%    -504  +0.5% +1.45Ki    TOTAL

UPDATE: Last 3 commits lead to:

  -0.1%    -528  +0.5% +1.21Ki    TOTAL

  Impact on origin/main:

    Before this commit
     -0.1%    -504  +0.5% +1.45Ki    TOTAL

    After this commit
     -0.1%    -392  +0.5% +1.27Ki    TOTAL
In a follow-up PR I'm adding Status return type out of necessity, so
this change is not only for binary size reasons.

    -0.1%    -528  +0.5% +1.21Ki    TOTAL
@felipecrv felipecrv requested a review from pitrou June 6, 2024 17:09
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 6, 2024
@@ -117,6 +125,7 @@
#define ARROW_PREFETCH(addr)
#define ARROW_RESTRICT
#define ARROW_COMPILER_ASSUME(expr)
#define ARROW_COMPILER_UNREACHABLE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it use https://en.cppreference.com/w/cpp/utility/unreachable with an adequate feature test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will undo this for now and merge. I added this for very minor binary size gains. I will tackle this in the next PR.

@pitrou
Copy link
Member

pitrou commented Jun 10, 2024

I've run the Take micro-benchmarks locally with this (AMD Zen 2, gcc 12.3.0). The changes a bit all over the place and show that compilers are generally capricious and difficult to steer towards "optimal" code :-)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (86)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                 benchmark           baseline          contender  change %                                                                                                                                                                                                                                        counters
      TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1/9 423.000M items/sec   6.628G items/sec  1466.938     {'family_index': 4, 'per_family_instance_index': 7, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 565, 'byte_width': 9.0, 'null_percent': 100.0}
    TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/1/9 299.749M items/sec 731.380M items/sec   143.998  {'family_index': 16, 'per_family_instance_index': 7, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 396, 'byte_width': 9.0, 'null_percent': 100.0}
          TakeChunkedChunkedFSBMonotonicIndices/524288/1/9 137.090M items/sec 222.341M items/sec    62.185        {'family_index': 17, 'per_family_instance_index': 7, 'run_name': 'TakeChunkedChunkedFSBMonotonicIndices/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 198, 'byte_width': 9.0, 'null_percent': 100.0}
            TakeFixedSizeBinaryMonotonicIndices/524288/1/9 184.516M items/sec 297.998M items/sec    61.503           {'family_index': 5, 'per_family_instance_index': 7, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 249, 'byte_width': 9.0, 'null_percent': 100.0}
          TakeChunkedChunkedFSBMonotonicIndices/524288/0/9 192.469M items/sec 302.407M items/sec    57.120          {'family_index': 17, 'per_family_instance_index': 9, 'run_name': 'TakeChunkedChunkedFSBMonotonicIndices/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 253, 'byte_width': 9.0, 'null_percent': 0.0}
        TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1/9 189.605M items/sec 297.548M items/sec    56.931       {'family_index': 3, 'per_family_instance_index': 7, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 255, 'byte_width': 9.0, 'null_percent': 100.0}
            TakeFixedSizeBinaryMonotonicIndices/524288/0/9 259.447M items/sec 406.608M items/sec    56.721             {'family_index': 5, 'per_family_instance_index': 9, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 348, 'byte_width': 9.0, 'null_percent': 0.0}
      TakeFixedSizeBinaryRandomIndicesWithNulls/524288/0/9 229.434M items/sec 327.106M items/sec    42.571       {'family_index': 4, 'per_family_instance_index': 9, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 291, 'byte_width': 9.0, 'null_percent': 0.0}
        TakeFixedSizeBinaryRandomIndicesNoNulls/524288/0/9 230.187M items/sec 321.062M items/sec    39.479         {'family_index': 3, 'per_family_instance_index': 9, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 305, 'byte_width': 9.0, 'null_percent': 0.0}
      TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/0/9 178.603M items/sec 243.824M items/sec    36.517      {'family_index': 15, 'per_family_instance_index': 9, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 242, 'byte_width': 9.0, 'null_percent': 0.0}
      TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/1/9 157.029M items/sec 213.477M items/sec    35.948    {'family_index': 15, 'per_family_instance_index': 7, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 208, 'byte_width': 9.0, 'null_percent': 100.0}
    TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/0/9 181.385M items/sec 240.142M items/sec    32.393    {'family_index': 16, 'per_family_instance_index': 9, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 246, 'byte_width': 9.0, 'null_percent': 0.0}
     TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1000/9 138.613M items/sec 178.185M items/sec    28.548      {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1000/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 188, 'byte_width': 9.0, 'null_percent': 0.1}
       TakeFixedSizeBinaryRandomIndicesNoNulls/524288/10/9 129.101M items/sec 160.774M items/sec    24.533       {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 173, 'byte_width': 9.0, 'null_percent': 10.0}
         TakeFixedSizeBinaryMonotonicIndices/524288/1000/9 185.774M items/sec 228.147M items/sec    22.809          {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/1000/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 250, 'byte_width': 9.0, 'null_percent': 0.1}
   TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1000/9 136.620M items/sec 167.200M items/sec    22.383    {'family_index': 4, 'per_family_instance_index': 1, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1000/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 184, 'byte_width': 9.0, 'null_percent': 0.1}
           TakeFixedSizeBinaryMonotonicIndices/524288/10/9 162.159M items/sec 196.435M items/sec    21.138           {'family_index': 5, 'per_family_instance_index': 3, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 221, 'byte_width': 9.0, 'null_percent': 10.0}
     TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/10/9 114.506M items/sec 137.125M items/sec    19.753    {'family_index': 15, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 153, 'byte_width': 9.0, 'null_percent': 10.0}
 TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/1000/9 119.955M items/sec 142.302M items/sec    18.629 {'family_index': 16, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/1000/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 134, 'byte_width': 9.0, 'null_percent': 0.1}
       TakeChunkedChunkedFSBMonotonicIndices/524288/1000/9 158.880M items/sec 188.351M items/sec    18.549       {'family_index': 17, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedChunkedFSBMonotonicIndices/524288/1000/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 212, 'byte_width': 9.0, 'null_percent': 0.1}
   TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/1000/9 123.316M items/sec 145.043M items/sec    17.619   {'family_index': 15, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/1000/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 165, 'byte_width': 9.0, 'null_percent': 0.1}
         TakeChunkedChunkedFSBMonotonicIndices/524288/10/9 140.400M items/sec 165.035M items/sec    17.547        {'family_index': 17, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedChunkedFSBMonotonicIndices/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 175, 'byte_width': 9.0, 'null_percent': 10.0}
     TakeChunkedChunkedStringRandomIndicesNoNulls/524288/0  24.680M items/sec  28.793M items/sec    16.662                         {'family_index': 18, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedChunkedStringRandomIndicesNoNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 38, 'null_percent': 0.0}
            TakeFixedSizeBinaryMonotonicIndices/524288/2/9 126.148M items/sec 146.438M items/sec    16.084            {'family_index': 5, 'per_family_instance_index': 5, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 164, 'byte_width': 9.0, 'null_percent': 50.0}
                     TakeFSLInt64MonotonicIndices/524288/0 787.109M items/sec 913.254M items/sec    16.026                                        {'family_index': 8, 'per_family_instance_index': 4, 'run_name': 'TakeFSLInt64MonotonicIndices/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1052, 'null_percent': 0.0}
        TakeFixedSizeBinaryRandomIndicesNoNulls/524288/2/9  98.420M items/sec 113.492M items/sec    15.314        {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 133, 'byte_width': 9.0, 'null_percent': 50.0}
          TakeChunkedChunkedInt64MonotonicIndices/524288/0 471.212M items/sec 537.976M items/sec    14.169                             {'family_index': 14, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedChunkedInt64MonotonicIndices/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 645, 'null_percent': 0.0}
          TakeChunkedChunkedFSBMonotonicIndices/524288/0/8 476.152M items/sec 543.321M items/sec    14.107          {'family_index': 17, 'per_family_instance_index': 8, 'run_name': 'TakeChunkedChunkedFSBMonotonicIndices/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 633, 'byte_width': 8.0, 'null_percent': 0.0}
          TakeChunkedChunkedFSBMonotonicIndices/524288/2/9 112.936M items/sec 128.835M items/sec    14.078         {'family_index': 17, 'per_family_instance_index': 5, 'run_name': 'TakeChunkedChunkedFSBMonotonicIndices/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 151, 'byte_width': 9.0, 'null_percent': 50.0}
                        TakeInt64MonotonicIndices/524288/0 815.474M items/sec 914.640M items/sec    12.161                                           {'family_index': 2, 'per_family_instance_index': 4, 'run_name': 'TakeInt64MonotonicIndices/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1073, 'null_percent': 0.0}
            TakeFixedSizeBinaryMonotonicIndices/524288/0/8 819.620M items/sec 916.155M items/sec    11.778            {'family_index': 5, 'per_family_instance_index': 8, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1088, 'byte_width': 8.0, 'null_percent': 0.0}
         TakeChunkedFlatInt64RandomIndicesNoNulls/524288/0 483.752M items/sec 529.220M items/sec     9.399                            {'family_index': 21, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedFlatInt64RandomIndicesNoNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 720, 'null_percent': 0.0}
                       TakeStringMonotonicIndices/524288/0 785.800M items/sec 851.423M items/sec     8.351                                         {'family_index': 11, 'per_family_instance_index': 4, 'run_name': 'TakeStringMonotonicIndices/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1056, 'null_percent': 0.0}
               TakeFSLInt64RandomIndicesWithNulls/524288/0 695.436M items/sec 750.561M items/sec     7.927                                   {'family_index': 7, 'per_family_instance_index': 4, 'run_name': 'TakeFSLInt64RandomIndicesWithNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 900, 'null_percent': 0.0}
                     TakeFSLInt64MonotonicIndices/524288/1  66.532M items/sec  71.388M items/sec     7.299                                        {'family_index': 8, 'per_family_instance_index': 3, 'run_name': 'TakeFSLInt64MonotonicIndices/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 89, 'null_percent': 100.0}
       TakeChunkedFlatInt64RandomIndicesWithNulls/524288/1   1.559G items/sec   1.668G items/sec     7.012                       {'family_index': 22, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedFlatInt64RandomIndicesWithNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2086, 'null_percent': 100.0}
      TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/2/9  90.181M items/sec  96.449M items/sec     6.950     {'family_index': 15, 'per_family_instance_index': 5, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 122, 'byte_width': 9.0, 'null_percent': 50.0}
                 TakeFSLInt64RandomIndicesNoNulls/524288/1  66.747M items/sec  71.100M items/sec     6.520                                    {'family_index': 6, 'per_family_instance_index': 3, 'run_name': 'TakeFSLInt64RandomIndicesNoNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 93, 'null_percent': 100.0}
             TakeChunkedFlatInt64MonotonicIndices/524288/0 610.547M items/sec 649.274M items/sec     6.343                                {'family_index': 23, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedFlatInt64MonotonicIndices/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 828, 'null_percent': 0.0}
      TakeChunkedChunkedInt64RandomIndicesNoNulls/524288/0 407.444M items/sec 432.748M items/sec     6.210                         {'family_index': 12, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedChunkedInt64RandomIndicesNoNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 539, 'null_percent': 0.0}
      TakeFixedSizeBinaryRandomIndicesWithNulls/524288/2/9  74.928M items/sec  78.752M items/sec     5.104       {'family_index': 4, 'per_family_instance_index': 5, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 97, 'byte_width': 9.0, 'null_percent': 50.0}
                 TakeFSLInt64RandomIndicesNoNulls/524288/0 719.162M items/sec 754.224M items/sec     4.875                                     {'family_index': 6, 'per_family_instance_index': 4, 'run_name': 'TakeFSLInt64RandomIndicesNoNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 941, 'null_percent': 0.0}
               TakeFSLInt64RandomIndicesWithNulls/524288/2  38.928M items/sec  40.809M items/sec     4.832                                   {'family_index': 7, 'per_family_instance_index': 2, 'run_name': 'TakeFSLInt64RandomIndicesWithNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 53, 'null_percent': 50.0}
  TakeChunkedChunkedStringRandomIndicesWithNulls/524288/10  30.625M items/sec  31.965M items/sec     4.375                     {'family_index': 19, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedChunkedStringRandomIndicesWithNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 41, 'null_percent': 10.0}
               TakeFSLInt64RandomIndicesWithNulls/524288/1  84.040M items/sec  87.072M items/sec     3.608                                 {'family_index': 7, 'per_family_instance_index': 3, 'run_name': 'TakeFSLInt64RandomIndicesWithNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 111, 'null_percent': 100.0}
      TakeFixedSizeBinaryRandomIndicesWithNulls/524288/0/8 720.917M items/sec 746.809M items/sec     3.592       {'family_index': 4, 'per_family_instance_index': 8, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 974, 'byte_width': 8.0, 'null_percent': 0.0}
    TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/2/9  69.333M items/sec  71.629M items/sec     3.311    {'family_index': 16, 'per_family_instance_index': 5, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 94, 'byte_width': 9.0, 'null_percent': 50.0}
     TakeFixedSizeBinaryRandomIndicesWithNulls/524288/10/9 109.212M items/sec 112.669M items/sec     3.165     {'family_index': 4, 'per_family_instance_index': 3, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 147, 'byte_width': 9.0, 'null_percent': 10.0}
                TakeStringRandomIndicesNoNulls/524288/1000  22.153M items/sec  22.833M items/sec     3.070                                     {'family_index': 9, 'per_family_instance_index': 0, 'run_name': 'TakeStringRandomIndicesNoNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 30, 'null_percent': 0.1}
                 TakeStringRandomIndicesWithNulls/524288/1   4.032G items/sec   4.156G items/sec     3.069                                 {'family_index': 10, 'per_family_instance_index': 3, 'run_name': 'TakeStringRandomIndicesWithNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 5358, 'null_percent': 100.0}
                   TakeStringRandomIndicesNoNulls/524288/2  61.033M items/sec  62.890M items/sec     3.044                                       {'family_index': 9, 'per_family_instance_index': 2, 'run_name': 'TakeStringRandomIndicesNoNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 80, 'null_percent': 50.0}
   TakeChunkedChunkedStringRandomIndicesWithNulls/524288/2  56.312M items/sec  57.906M items/sec     2.831                      {'family_index': 19, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedChunkedStringRandomIndicesWithNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 75, 'null_percent': 50.0}
   TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/10/9  98.562M items/sec 101.317M items/sec     2.795  {'family_index': 16, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 131, 'byte_width': 9.0, 'null_percent': 10.0}
  TakeChunkedChunkedStringRandomIndicesNoNulls/524288/1000  28.857M items/sec  29.634M items/sec     2.693                      {'family_index': 18, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedChunkedStringRandomIndicesNoNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 37, 'null_percent': 0.1}
         TakeChunkedChunkedStringMonotonicIndices/524288/0  47.779M items/sec  48.996M items/sec     2.545                             {'family_index': 20, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedChunkedStringMonotonicIndices/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 64, 'null_percent': 0.0}
                       TakeStringMonotonicIndices/524288/1  67.132M items/sec  68.571M items/sec     2.144                                         {'family_index': 11, 'per_family_instance_index': 3, 'run_name': 'TakeStringMonotonicIndices/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 90, 'null_percent': 100.0}
         TakeChunkedChunkedStringMonotonicIndices/524288/1 220.296M items/sec 223.992M items/sec     1.678                          {'family_index': 20, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedChunkedStringMonotonicIndices/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 295, 'null_percent': 100.0}
       TakeChunkedFlatInt64RandomIndicesWithNulls/524288/0 516.783M items/sec 524.475M items/sec     1.489                          {'family_index': 22, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedFlatInt64RandomIndicesWithNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 674, 'null_percent': 0.0}
                  TakeInt64RandomIndicesWithNulls/524288/0 708.727M items/sec 719.105M items/sec     1.464                                      {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'TakeInt64RandomIndicesWithNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 945, 'null_percent': 0.0}
        TakeFixedSizeBinaryRandomIndicesNoNulls/524288/0/8 728.342M items/sec 738.750M items/sec     1.429         {'family_index': 3, 'per_family_instance_index': 8, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 969, 'byte_width': 8.0, 'null_percent': 0.0}
   TakeChunkedChunkedStringRandomIndicesWithNulls/524288/0  28.581M items/sec  28.988M items/sec     1.423                       {'family_index': 19, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedChunkedStringRandomIndicesWithNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 38, 'null_percent': 0.0}
   TakeChunkedChunkedStringRandomIndicesWithNulls/524288/1   1.167G items/sec   1.184G items/sec     1.393                   {'family_index': 19, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedChunkedStringRandomIndicesWithNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1556, 'null_percent': 100.0}
      TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1/8   7.216G items/sec   7.307G items/sec     1.262    {'family_index': 4, 'per_family_instance_index': 6, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 9536, 'byte_width': 8.0, 'null_percent': 100.0}
    TakeChunkedChunkedInt64RandomIndicesWithNulls/524288/0 435.696M items/sec 441.192M items/sec     1.261                       {'family_index': 13, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedChunkedInt64RandomIndicesWithNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 585, 'null_percent': 0.0}
                     TakeFSLInt64MonotonicIndices/524288/2  58.808M items/sec  59.212M items/sec     0.688                                         {'family_index': 8, 'per_family_instance_index': 2, 'run_name': 'TakeFSLInt64MonotonicIndices/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 79, 'null_percent': 50.0}
                   TakeStringRandomIndicesNoNulls/524288/1 250.252M items/sec 250.508M items/sec     0.102                                     {'family_index': 9, 'per_family_instance_index': 3, 'run_name': 'TakeStringRandomIndicesNoNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 332, 'null_percent': 100.0}
TakeChunkedChunkedStringRandomIndicesWithNulls/524288/1000  28.414M items/sec  28.361M items/sec    -0.186                    {'family_index': 19, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedChunkedStringRandomIndicesWithNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 38, 'null_percent': 0.1}
                TakeStringRandomIndicesWithNulls/524288/10  33.878M items/sec  33.738M items/sec    -0.413                                   {'family_index': 10, 'per_family_instance_index': 1, 'run_name': 'TakeStringRandomIndicesWithNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 45, 'null_percent': 10.0}
     TakeChunkedChunkedStringRandomIndicesNoNulls/524288/2  56.097M items/sec  55.857M items/sec    -0.427                        {'family_index': 18, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedChunkedStringRandomIndicesNoNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 75, 'null_percent': 50.0}
     TakeChunkedChunkedStringRandomIndicesNoNulls/524288/1 225.625M items/sec 224.634M items/sec    -0.439                      {'family_index': 18, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedChunkedStringRandomIndicesNoNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 304, 'null_percent': 100.0}
      TakeChunkedChunkedStringMonotonicIndices/524288/1000  50.359M items/sec  49.834M items/sec    -1.043                          {'family_index': 20, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedChunkedStringMonotonicIndices/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 66, 'null_percent': 0.1}
                  TakeInt64RandomIndicesWithNulls/524288/1   7.198G items/sec   7.112G items/sec    -1.191                                   {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'TakeInt64RandomIndicesWithNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 9591, 'null_percent': 100.0}
         TakeChunkedChunkedStringMonotonicIndices/524288/2  86.073M items/sec  84.996M items/sec    -1.252                           {'family_index': 20, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedChunkedStringMonotonicIndices/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 114, 'null_percent': 50.0}
      TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/0/8 430.819M items/sec 423.785M items/sec    -1.633      {'family_index': 15, 'per_family_instance_index': 8, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 580, 'byte_width': 8.0, 'null_percent': 0.0}
        TakeChunkedChunkedStringMonotonicIndices/524288/10  52.095M items/sec  51.106M items/sec    -1.899                           {'family_index': 20, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedChunkedStringMonotonicIndices/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 69, 'null_percent': 10.0}
                  TakeStringRandomIndicesNoNulls/524288/10  23.805M items/sec  23.350M items/sec    -1.909                                      {'family_index': 9, 'per_family_instance_index': 1, 'run_name': 'TakeStringRandomIndicesNoNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 31, 'null_percent': 10.0}
                    TakeInt64RandomIndicesNoNulls/524288/0 718.273M items/sec 700.958M items/sec    -2.411                                        {'family_index': 0, 'per_family_instance_index': 4, 'run_name': 'TakeInt64RandomIndicesNoNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 966, 'null_percent': 0.0}
    TakeChunkedChunkedStringRandomIndicesNoNulls/524288/10  33.176M items/sec  32.265M items/sec    -2.748                       {'family_index': 18, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedChunkedStringRandomIndicesNoNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 44, 'null_percent': 10.0}
                 TakeStringRandomIndicesWithNulls/524288/2  62.261M items/sec  60.517M items/sec    -2.802                                    {'family_index': 10, 'per_family_instance_index': 2, 'run_name': 'TakeStringRandomIndicesWithNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 82, 'null_percent': 50.0}
                   TakeStringRandomIndicesNoNulls/524288/0  20.935M items/sec  20.275M items/sec    -3.155                                        {'family_index': 9, 'per_family_instance_index': 4, 'run_name': 'TakeStringRandomIndicesNoNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 28, 'null_percent': 0.0}
                    TakeFSLInt64MonotonicIndices/524288/10 100.329M items/sec  97.103M items/sec    -3.215                                       {'family_index': 8, 'per_family_instance_index': 1, 'run_name': 'TakeFSLInt64MonotonicIndices/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 135, 'null_percent': 10.0}
              TakeStringRandomIndicesWithNulls/524288/1000  22.013M items/sec  21.300M items/sec    -3.238                                  {'family_index': 10, 'per_family_instance_index': 0, 'run_name': 'TakeStringRandomIndicesWithNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 30, 'null_percent': 0.1}
                  TakeFSLInt64MonotonicIndices/524288/1000 138.319M items/sec 133.809M items/sec    -3.261                                      {'family_index': 8, 'per_family_instance_index': 0, 'run_name': 'TakeFSLInt64MonotonicIndices/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 184, 'null_percent': 0.1}
    TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/1/8 973.877M items/sec 941.054M items/sec    -3.370 {'family_index': 16, 'per_family_instance_index': 6, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1287, 'byte_width': 8.0, 'null_percent': 100.0}
                 TakeStringRandomIndicesWithNulls/524288/0  20.697M items/sec  19.930M items/sec    -3.707                                     {'family_index': 10, 'per_family_instance_index': 4, 'run_name': 'TakeStringRandomIndicesWithNulls/524288/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 28, 'null_percent': 0.0}
              TakeFSLInt64RandomIndicesWithNulls/524288/10  68.597M items/sec  65.478M items/sec    -4.547                                  {'family_index': 7, 'per_family_instance_index': 1, 'run_name': 'TakeFSLInt64RandomIndicesWithNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 92, 'null_percent': 10.0}

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (64)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                benchmark           baseline          contender  change %                                                                                                                                                                                                                                        counters
   TakeChunkedChunkedInt64RandomIndicesWithNulls/524288/1   1.001G items/sec 950.629M items/sec    -5.006                    {'family_index': 13, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedChunkedInt64RandomIndicesWithNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1320, 'null_percent': 100.0}
           TakeFSLInt64RandomIndicesWithNulls/524288/1000 110.053M items/sec 103.738M items/sec    -5.738                                {'family_index': 7, 'per_family_instance_index': 0, 'run_name': 'TakeFSLInt64RandomIndicesWithNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 147, 'null_percent': 0.1}
                TakeFSLInt64RandomIndicesNoNulls/524288/2  51.364M items/sec  48.349M items/sec    -5.869                                     {'family_index': 6, 'per_family_instance_index': 2, 'run_name': 'TakeFSLInt64RandomIndicesNoNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 68, 'null_percent': 50.0}
                      TakeStringMonotonicIndices/524288/2  59.357M items/sec  55.607M items/sec    -6.317                                          {'family_index': 11, 'per_family_instance_index': 2, 'run_name': 'TakeStringMonotonicIndices/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 79, 'null_percent': 50.0}
   TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/0/8 432.962M items/sec 405.540M items/sec    -6.334    {'family_index': 16, 'per_family_instance_index': 8, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 579, 'byte_width': 8.0, 'null_percent': 0.0}
                   TakeStringMonotonicIndices/524288/1000 138.629M items/sec 128.420M items/sec    -7.364                                       {'family_index': 11, 'per_family_instance_index': 0, 'run_name': 'TakeStringMonotonicIndices/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 184, 'null_percent': 0.1}
                     TakeStringMonotonicIndices/524288/10 100.905M items/sec  92.883M items/sec    -7.950                                        {'family_index': 11, 'per_family_instance_index': 1, 'run_name': 'TakeStringMonotonicIndices/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 135, 'null_percent': 10.0}
                       TakeInt64MonotonicIndices/524288/2 221.337M items/sec 202.318M items/sec    -8.593                                           {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'TakeInt64MonotonicIndices/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 305, 'null_percent': 50.0}
        TakeChunkedChunkedInt64MonotonicIndices/524288/10 255.436M items/sec 230.562M items/sec    -9.738                           {'family_index': 14, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedChunkedInt64MonotonicIndices/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 344, 'null_percent': 10.0}
         TakeChunkedChunkedInt64MonotonicIndices/524288/2 195.065M items/sec 174.561M items/sec   -10.511                            {'family_index': 14, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedChunkedInt64MonotonicIndices/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 260, 'null_percent': 50.0}
         TakeChunkedChunkedFSBMonotonicIndices/524288/2/8 195.399M items/sec 174.380M items/sec   -10.757         {'family_index': 17, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedChunkedFSBMonotonicIndices/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 261, 'byte_width': 8.0, 'null_percent': 50.0}
        TakeChunkedChunkedFSBMonotonicIndices/524288/10/8 259.057M items/sec 230.236M items/sec   -11.125        {'family_index': 17, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedChunkedFSBMonotonicIndices/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 345, 'byte_width': 8.0, 'null_percent': 10.0}
               TakeFSLInt64RandomIndicesNoNulls/524288/10  91.682M items/sec  81.282M items/sec   -11.344                                   {'family_index': 6, 'per_family_instance_index': 1, 'run_name': 'TakeFSLInt64RandomIndicesNoNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 124, 'null_percent': 10.0}
            TakeChunkedFlatInt64MonotonicIndices/524288/2 206.386M items/sec 182.888M items/sec   -11.385                               {'family_index': 23, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedFlatInt64MonotonicIndices/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 274, 'null_percent': 50.0}
           TakeFixedSizeBinaryMonotonicIndices/524288/2/8 230.747M items/sec 203.991M items/sec   -11.595            {'family_index': 5, 'per_family_instance_index': 4, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 304, 'byte_width': 8.0, 'null_percent': 50.0}
      TakeChunkedChunkedInt64MonotonicIndices/524288/1000 309.856M items/sec 273.410M items/sec   -11.762                          {'family_index': 14, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedChunkedInt64MonotonicIndices/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 421, 'null_percent': 0.1}
           TakeChunkedFlatInt64MonotonicIndices/524288/10 282.721M items/sec 247.747M items/sec   -12.370                              {'family_index': 23, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedFlatInt64MonotonicIndices/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 388, 'null_percent': 10.0}
      TakeChunkedChunkedFSBMonotonicIndices/524288/1000/8 314.628M items/sec 274.680M items/sec   -12.697       {'family_index': 17, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedChunkedFSBMonotonicIndices/524288/1000/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 417, 'byte_width': 8.0, 'null_percent': 0.1}
                      TakeInt64MonotonicIndices/524288/10 325.126M items/sec 283.762M items/sec   -12.723                                          {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'TakeInt64MonotonicIndices/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 435, 'null_percent': 10.0}
             TakeFSLInt64RandomIndicesNoNulls/524288/1000 121.619M items/sec 105.817M items/sec   -12.993                                  {'family_index': 6, 'per_family_instance_index': 0, 'run_name': 'TakeFSLInt64RandomIndicesNoNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 162, 'null_percent': 0.1}
         TakeChunkedFlatInt64MonotonicIndices/524288/1000 346.368M items/sec 301.325M items/sec   -13.004                             {'family_index': 23, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedFlatInt64MonotonicIndices/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 476, 'null_percent': 0.1}
      TakeChunkedFlatInt64RandomIndicesWithNulls/524288/2 103.172M items/sec  89.469M items/sec   -13.282                         {'family_index': 22, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedFlatInt64RandomIndicesWithNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 137, 'null_percent': 50.0}
         TakeChunkedChunkedFSBMonotonicIndices/524288/1/8 430.312M items/sec 371.092M items/sec   -13.762        {'family_index': 17, 'per_family_instance_index': 6, 'run_name': 'TakeChunkedChunkedFSBMonotonicIndices/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 576, 'byte_width': 8.0, 'null_percent': 100.0}
                    TakeInt64MonotonicIndices/524288/1000 412.287M items/sec 350.592M items/sec   -14.964                                         {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'TakeInt64MonotonicIndices/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 553, 'null_percent': 0.1}
         TakeChunkedChunkedInt64MonotonicIndices/524288/1 436.243M items/sec 369.718M items/sec   -15.250                           {'family_index': 14, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedChunkedInt64MonotonicIndices/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 582, 'null_percent': 100.0}
          TakeFixedSizeBinaryMonotonicIndices/524288/10/8 333.154M items/sec 282.199M items/sec   -15.295           {'family_index': 5, 'per_family_instance_index': 2, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 444, 'byte_width': 8.0, 'null_percent': 10.0}
     TakeChunkedChunkedInt64RandomIndicesNoNulls/524288/2 149.951M items/sec 127.005M items/sec   -15.302                        {'family_index': 12, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedChunkedInt64RandomIndicesNoNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 195, 'null_percent': 50.0}
       TakeFixedSizeBinaryRandomIndicesNoNulls/524288/2/8 178.236M items/sec 150.734M items/sec   -15.430        {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 239, 'byte_width': 8.0, 'null_percent': 50.0}
   TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/2/8 104.347M items/sec  87.711M items/sec   -15.943   {'family_index': 16, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 139, 'byte_width': 8.0, 'null_percent': 50.0}
                 TakeInt64RandomIndicesWithNulls/524288/2 110.744M items/sec  93.020M items/sec   -16.005                                     {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'TakeInt64RandomIndicesWithNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 149, 'null_percent': 50.0}
        TakeFixedSizeBinaryMonotonicIndices/524288/1000/8 416.393M items/sec 349.053M items/sec   -16.172          {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/1000/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 569, 'byte_width': 8.0, 'null_percent': 0.1}
                   TakeInt64RandomIndicesNoNulls/524288/2 175.322M items/sec 146.187M items/sec   -16.618                                       {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'TakeInt64RandomIndicesNoNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 234, 'null_percent': 50.0}
     TakeChunkedChunkedInt64RandomIndicesNoNulls/524288/1 409.102M items/sec 340.247M items/sec   -16.831                       {'family_index': 12, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedChunkedInt64RandomIndicesNoNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 544, 'null_percent': 100.0}
     TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/2/8 154.752M items/sec 126.380M items/sec   -18.334     {'family_index': 15, 'per_family_instance_index': 4, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 205, 'byte_width': 8.0, 'null_percent': 50.0}
     TakeFixedSizeBinaryRandomIndicesWithNulls/524288/2/8 115.753M items/sec  94.453M items/sec   -18.401      {'family_index': 4, 'per_family_instance_index': 4, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 153, 'byte_width': 8.0, 'null_percent': 50.0}
        TakeChunkedFlatInt64RandomIndicesNoNulls/524288/2 170.047M items/sec 137.985M items/sec   -18.855                           {'family_index': 21, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedFlatInt64RandomIndicesNoNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 227, 'null_percent': 50.0}
   TakeChunkedChunkedInt64RandomIndicesWithNulls/524288/2 104.949M items/sec  84.181M items/sec   -19.789                      {'family_index': 13, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedChunkedInt64RandomIndicesWithNulls/524288/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 140, 'null_percent': 50.0}
    TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/10/8 215.087M items/sec 172.509M items/sec   -19.796    {'family_index': 15, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 294, 'byte_width': 8.0, 'null_percent': 10.0}
  TakeChunkedChunkedInt64RandomIndicesWithNulls/524288/10 152.248M items/sec 121.846M items/sec   -19.969                     {'family_index': 13, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedChunkedInt64RandomIndicesWithNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 205, 'null_percent': 10.0}
     TakeChunkedFlatInt64RandomIndicesWithNulls/524288/10 163.847M items/sec 131.028M items/sec   -20.030                        {'family_index': 22, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedFlatInt64RandomIndicesWithNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 220, 'null_percent': 10.0}
  TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/10/8 160.033M items/sec 127.702M items/sec   -20.202  {'family_index': 16, 'per_family_instance_index': 2, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 213, 'byte_width': 8.0, 'null_percent': 10.0}
            TakeChunkedFlatInt64MonotonicIndices/524288/1 534.901M items/sec 425.783M items/sec   -20.400                              {'family_index': 23, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedFlatInt64MonotonicIndices/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 708, 'null_percent': 100.0}
      TakeFixedSizeBinaryRandomIndicesNoNulls/524288/10/8 265.693M items/sec 210.311M items/sec   -20.844       {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 359, 'byte_width': 8.0, 'null_percent': 10.0}
                       TakeInt64MonotonicIndices/524288/1 716.931M items/sec 565.379M items/sec   -21.139                                          {'family_index': 2, 'per_family_instance_index': 3, 'run_name': 'TakeInt64MonotonicIndices/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 938, 'null_percent': 100.0}
  TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/1000/8 242.890M items/sec 190.956M items/sec   -21.381   {'family_index': 15, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/1000/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 322, 'byte_width': 8.0, 'null_percent': 0.1}
TakeChunkedChunkedInt64RandomIndicesWithNulls/524288/1000 222.384M items/sec 173.899M items/sec   -21.802                    {'family_index': 13, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedChunkedInt64RandomIndicesWithNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 295, 'null_percent': 0.1}
   TakeChunkedFlatInt64RandomIndicesWithNulls/524288/1000 242.960M items/sec 189.830M items/sec   -21.868                       {'family_index': 22, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedFlatInt64RandomIndicesWithNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 323, 'null_percent': 0.1}
        TakeChunkedFlatInt64RandomIndicesNoNulls/524288/1 519.779M items/sec 404.965M items/sec   -22.089                          {'family_index': 21, 'per_family_instance_index': 3, 'run_name': 'TakeChunkedFlatInt64RandomIndicesNoNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 717, 'null_percent': 100.0}
                TakeInt64RandomIndicesWithNulls/524288/10 180.171M items/sec 139.484M items/sec   -22.582                                    {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'TakeInt64RandomIndicesWithNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 240, 'null_percent': 10.0}
                  TakeInt64RandomIndicesNoNulls/524288/10 263.139M items/sec 202.899M items/sec   -22.893                                      {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'TakeInt64RandomIndicesNoNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 351, 'null_percent': 10.0}
TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/1000/8 229.423M items/sec 176.622M items/sec   -23.014 {'family_index': 16, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesWithNulls/524288/1000/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 307, 'byte_width': 8.0, 'null_percent': 0.1}
     TakeChunkedFlatInt64RandomIndicesNoNulls/524288/1000 273.918M items/sec 210.633M items/sec   -23.104                         {'family_index': 21, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedFlatInt64RandomIndicesNoNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 368, 'null_percent': 0.1}
  TakeChunkedChunkedInt64RandomIndicesNoNulls/524288/1000 231.446M items/sec 176.407M items/sec   -23.781                      {'family_index': 12, 'per_family_instance_index': 0, 'run_name': 'TakeChunkedChunkedInt64RandomIndicesNoNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 308, 'null_percent': 0.1}
           TakeFixedSizeBinaryMonotonicIndices/524288/1/8 750.779M items/sec 571.303M items/sec   -23.905           {'family_index': 5, 'per_family_instance_index': 6, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 994, 'byte_width': 8.0, 'null_percent': 100.0}
    TakeFixedSizeBinaryRandomIndicesWithNulls/524288/10/8 187.491M items/sec 142.620M items/sec   -23.933     {'family_index': 4, 'per_family_instance_index': 2, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 247, 'byte_width': 8.0, 'null_percent': 10.0}
    TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1000/8 304.735M items/sec 230.960M items/sec   -24.210      {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1000/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 409, 'byte_width': 8.0, 'null_percent': 0.1}
                   TakeInt64RandomIndicesNoNulls/524288/1 687.284M items/sec 519.230M items/sec   -24.452                                      {'family_index': 0, 'per_family_instance_index': 3, 'run_name': 'TakeInt64RandomIndicesNoNulls/524288/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 905, 'null_percent': 100.0}
              TakeInt64RandomIndicesWithNulls/524288/1000 283.026M items/sec 212.189M items/sec   -25.028                                   {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'TakeInt64RandomIndicesWithNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 376, 'null_percent': 0.1}
       TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1/8 711.122M items/sec 531.042M items/sec   -25.323       {'family_index': 3, 'per_family_instance_index': 6, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 942, 'byte_width': 8.0, 'null_percent': 100.0}
                TakeInt64RandomIndicesNoNulls/524288/1000 298.449M items/sec 222.481M items/sec   -25.454                                     {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'TakeInt64RandomIndicesNoNulls/524288/1000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 401, 'null_percent': 0.1}
     TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/1/8 425.031M items/sec 315.623M items/sec   -25.741    {'family_index': 15, 'per_family_instance_index': 6, 'run_name': 'TakeChunkedChunkedFSBRandomIndicesNoNulls/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 566, 'byte_width': 8.0, 'null_percent': 100.0}
       TakeChunkedFlatInt64RandomIndicesNoNulls/524288/10 247.351M items/sec 180.840M items/sec   -26.890                          {'family_index': 21, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedFlatInt64RandomIndicesNoNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 327, 'null_percent': 10.0}
    TakeChunkedChunkedInt64RandomIndicesNoNulls/524288/10 209.893M items/sec 153.392M items/sec   -26.919                       {'family_index': 12, 'per_family_instance_index': 1, 'run_name': 'TakeChunkedChunkedInt64RandomIndicesNoNulls/524288/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 282, 'null_percent': 10.0}
  TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1000/8 293.743M items/sec 204.054M items/sec   -30.533    {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1000/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 391, 'byte_width': 8.0, 'null_percent': 0.1}

@felipecrv
Copy link
Contributor Author

felipecrv commented Jun 10, 2024

I've run the Take micro-benchmarks locally with this (AMD Zen 2, gcc 12.3.0). The changes a bit all over the place and show that compilers are generally capricious and difficult to steer towards "optimal" code :-)

Do you want me to copy and paste the code so that gcc can inline modular code correctly?

EDIT: The things that the PR improves are much faster. The ChunkedChunked benchmarks are -30% +30% because they are the benchmarks of concatenation which has high variance due to the number of huge memory operations they perform. I'm improving ChunkedChunked (TakeCC) in my follow-up PR.

@pitrou
Copy link
Member

pitrou commented Jun 10, 2024

Do you want me to copy and paste the code so that gcc can inline modular code correctly?

No, it's not important. I was just posting the results for information.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 10, 2024
@felipecrv felipecrv merged commit 4f89097 into apache:main Jun 10, 2024
37 of 38 checks passed
@felipecrv felipecrv removed the awaiting change review Awaiting change review label Jun 10, 2024
@felipecrv felipecrv deleted the gather_fixed branch June 10, 2024 15:33
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 4f89097.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 20 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants