Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++] Speed up set_intersection() by fast-forwarding over ranges of non-matching elements with one-sided binary search. #75230

Merged
merged 62 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
b65415f
[libc++][test] Add lower_bound complexity validation tests prior to i…
ichaer Oct 17, 2023
f6bcf27
[libc++] Introduce one-sided binary search for lower_bound on non-ran…
ichaer Oct 17, 2023
36bb63e
[libc++][test] Add set_intersection complexity validation tests prior…
ichaer Oct 17, 2023
c23272c
[libc++] Introduce use of __lower_bound_onesided to improve average c…
ichaer Oct 17, 2023
0b57ea0
Fix `constexpr` annotations.
ichaer Jan 2, 2024
08af548
Remove std::exchange dependency from std::set_intersection so it work…
ichaer Jan 2, 2024
7aa3927
Review feedback: don't use one-sided lower bound in lower_bound() its…
ichaer Jan 2, 2024
c44c2a2
Create new benchmark for set_intersection().
ichaer Jan 2, 2024
46cc95f
Formatting fixups.
ichaer Jan 5, 2024
450f5ce
General improvements to benchmark, including simplifying and slimming…
ichaer Jan 8, 2024
d0c5f2b
Huh, I wonder how I got `git clang-format` to miss those changes =/
ichaer Jan 15, 2024
faa3115
Oops, bad mistake while porting into libc++! `__lower_bound_onesided(…
ichaer Jan 23, 2024
995d04b
Oops, bad tracking of displacement on `stride_counting_iterator`
ichaer Jan 23, 2024
d568d49
Add more counters to the set_intersection benchmark, guard them behin…
ichaer Jan 23, 2024
6ba7061
Merge branch 'main' into onesided_lower_bound
ichaer Jan 29, 2024
76c33ca
Merge branch 'main' into onesided_lower_bound
ichaer Jan 29, 2024
bb872e0
Revert "Oops, bad tracking of displacement on `stride_counting_iterat…
ichaer Feb 1, 2024
a1cd8ff
* Fix C++03 compatibility issues.
ichaer Feb 1, 2024
24d1d5b
Remove non-ascii characters, CI doesn't like them.
ichaer Feb 1, 2024
f17fa58
Merge remote-tracking branch 'llvm/main' into onesided_lower_bound
ichaer Feb 1, 2024
4b73773
Oops, missed an #include
ichaer Feb 2, 2024
65bd9b7
Merge remote-tracking branch 'llvm/main' into onesided_lower_bound
ichaer Feb 5, 2024
d0facc5
set_intersection.h: remove `static constexpr`, it breaks constexprnes…
ichaer Feb 5, 2024
a12aa37
Fix constexpr shenanigans with gcc and `stride_counting_iterator`
ichaer Feb 5, 2024
69dba78
Restrict number of constexpr steps so `ranges_set_intersection.pass.c…
ichaer Feb 6, 2024
fe1fe8c
Fix constexpr annotation and make internal methods private in _IterOps
ichaer Feb 12, 2024
bb2c758
Allow for assertions in comparison count when in hardened mode for co…
ichaer Feb 12, 2024
c6b895c
Revert lower_bound.pass.cpp changes, will move into a new PR.
ichaer Feb 13, 2024
31321b9
Oops, forgot to revert this one too.
ichaer Feb 13, 2024
6c88549
Merge remote-tracking branch 'llvm/main' into onesided_lower_bound
ichaer Apr 23, 2024
3805e95
s/_LIBCPP_NODISCARD_EXT/_LIBCPP_NODISCARD/ after merging #87094
ichaer Apr 23, 2024
090df86
Address feedback about qualifying abort(), added comment to clarify c…
ichaer Apr 23, 2024
cb92d3c
Address comment about broken comment for `getVectorOfRandom()`: move …
ichaer Apr 23, 2024
f4a6f36
Oops, forgot to format =/. Still working on the remaining feedback, b…
ichaer Apr 23, 2024
3f9cfec
Address comment about making the benchmark's `struct MoveInto` into a…
ichaer Apr 23, 2024
1afb99d
Address comment about using `common.h`'s `Quantities` constant in the…
ichaer Apr 23, 2024
613e64a
Address feedback to improve assertion in _IterOps::__advance()
ichaer Apr 23, 2024
4588447
Rename new sentinel-based `_IterOps::advance()` to `_IterOps::__advan…
ichaer Apr 23, 2024
2af9a6f
Address feedback about using `iterator_traits<_Iter>::difference_type…
ichaer Apr 23, 2024
4f05ded
git clang-format on the last batch of changes
ichaer Apr 23, 2024
161d81c
Address review comments about lower_bound.h
ichaer Apr 24, 2024
3c9f800
Address review comments about set_intersection.h: unnecessary namespa…
ichaer Apr 24, 2024
4aa4a82
Address review comment about replacing `struct __set_intersector` wit…
ichaer Apr 24, 2024
8307b2d
Make `__add_output_unless()` a freestanding function, `__set_intersec…
ichaer Apr 26, 2024
be6c5c8
Address comment about using ` std::forward<_Compare>()` for consisten…
ichaer Apr 27, 2024
62a6010
Address review feedback: remove benchmark counters.
ichaer Apr 27, 2024
e2af5cc
clang-format of the last 2 changes
ichaer Apr 27, 2024
89201ea
Oops, leftover template type name!
ichaer Apr 28, 2024
5f6e7fe
Remove unnecessary PauseTiming()/ResumeTiming() in the benchmark data…
ichaer Apr 29, 2024
109e5a4
Apply suggestions from code review
ichaer May 24, 2024
cc95b51
clang-format fixups to inline code suggestions
ichaer May 24, 2024
91e4e51
Move new `_IterOps<_ClassicAlgPolicy>::__advance_to()` overloads next…
ichaer May 24, 2024
c977bb7
Refactor operation counts out of `stride_counting_iterator`.
ichaer May 24, 2024
b4fad5b
's/stride_counting_iterator/operation_counting_iterator/'
ichaer May 27, 2024
87f12c2
change complexity test interface, 's/testSetIntersectionAndReturnOpCo…
ichaer May 24, 2024
505c004
Move last complexity test to the new file, and add a matching one for…
ichaer May 27, 2024
95b118a
Add yet another complexity test, this one validating the standard gua…
ichaer May 27, 2024
b1bfa0f
Take into account additional comparison performed in assertion in har…
ichaer May 28, 2024
f501bdc
Add release note. It reads a bit awkward, maybe I'll come up with som…
ichaer May 28, 2024
c5df570
* s/__(prev_)?advanced/__$1_may_be_equal/g
ichaer Jul 16, 2024
6189e95
Oops
ichaer Jul 16, 2024
6eacf2f
Merge remote-tracking branch 'llvm/main' into onesided_lower_bound
ichaer Jul 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions libcxx/benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ set(BENCHMARK_TESTS
algorithms/ranges_sort.bench.cpp
algorithms/ranges_sort_heap.bench.cpp
algorithms/ranges_stable_sort.bench.cpp
algorithms/set_intersection.bench.cpp
algorithms/sort.bench.cpp
algorithms/sort_heap.bench.cpp
algorithms/stable_sort.bench.cpp
Expand Down
184 changes: 184 additions & 0 deletions libcxx/benchmarks/algorithms/set_intersection.bench.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
//===----------------------------------------------------------------------===//
ldionne marked this conversation as resolved.
Show resolved Hide resolved
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include <algorithm>
#include <cstdlib>
#include <iterator>
#include <set>
#include <vector>

#include "common.h"
#include "test_iterators.h"

namespace {

// types of containers we'll want to test, covering interesting iterator types
struct VectorContainer {
template <typename... Args>
using type = std::vector<Args...>;

static constexpr const char* Name = "Vector";
};

struct SetContainer {
template <typename... Args>
using type = std::set<Args...>;

static constexpr const char* Name = "Set";
};

using AllContainerTypes = std::tuple<VectorContainer, SetContainer>;

// set_intersection performance may depend on where matching values lie
enum class OverlapPosition {
None,
Front,
// performance-wise, matches at the back are identical to ones at the front
Interlaced,
};

struct AllOverlapPositions : EnumValuesAsTuple<AllOverlapPositions, OverlapPosition, 3> {
static constexpr const char* Names[] = {"None", "Front", "Interlaced"};
};

// forward_iterator wrapping which, for each increment, moves the underlying iterator forward Stride elements
template <typename Wrapped>
struct StridedFwdIt {
Wrapped base_;
unsigned stride_;

using iterator_category = std::forward_iterator_tag;
using difference_type = typename Wrapped::difference_type;
using value_type = typename Wrapped::value_type;
using pointer = typename Wrapped::pointer;
using reference = typename Wrapped::reference;

StridedFwdIt(Wrapped base, unsigned stride) : base_(base), stride_(stride) { assert(stride_ != 0); }

StridedFwdIt operator++() {
for (unsigned i = 0; i < stride_; ++i)
++base_;
return *this;
}
StridedFwdIt operator++(int) {
auto tmp = *this;
++*this;
return tmp;
}
value_type& operator*() { return *base_; }
const value_type& operator*() const { return *base_; }
value_type& operator->() { return *base_; }
const value_type& operator->() const { return *base_; }
bool operator==(const StridedFwdIt& o) const { return base_ == o.base_; }
bool operator!=(const StridedFwdIt& o) const { return !operator==(o); }
};
template <typename Wrapped>
StridedFwdIt(Wrapped, unsigned) -> StridedFwdIt<Wrapped>;

template <typename T>
std::vector<T> getVectorOfRandom(size_t N) {
std::vector<T> v;
fillValues(v, N, Order::Random);
sortValues(v, Order::Random);
return std::vector<T>(v);
}

// Realistically, data won't all be nicely contiguous in a container,
// we'll go through some effort to ensure that it's shuffled through memory
// this is especially important for containers with non-contiguous element
// storage, but it will affect even a std::vector, because when you copy a
// std::vector<std::string> the underlying data storage position for the char
// arrays of the copy are likely to have high locality
template <class Container>
std::pair<Container, Container> genCacheUnfriendlyData(size_t size1, size_t size2, OverlapPosition pos) {
using ValueType = typename Container::value_type;
auto move_into = [](auto first, auto last) {
Container out;
std::move(first, last, std::inserter(out, out.begin()));
return out;
};
const auto src_size = pos == OverlapPosition::None ? size1 + size2 : std::max(size1, size2);
std::vector<ValueType> src = getVectorOfRandom<ValueType>(src_size);

if (pos == OverlapPosition::None) {
std::sort(src.begin(), src.end());
return std::make_pair(move_into(src.begin(), src.begin() + size1), move_into(src.begin() + size1, src.end()));
}

// All other overlap types will have to copy some part of the data, but if
// we copy after sorting it will likely have high locality, so we sort
// each copy separately
auto copy = src;
std::sort(src.begin(), src.end());
std::sort(copy.begin(), copy.end());

switch (pos) {
case OverlapPosition::None:
// we like -Wswitch :)
break;

case OverlapPosition::Front:
return std::make_pair(move_into(src.begin(), src.begin() + size1), move_into(copy.begin(), copy.begin() + size2));

case OverlapPosition::Interlaced:
const auto stride1 = size1 < size2 ? size2 / size1 : 1;
const auto stride2 = size2 < size1 ? size1 / size2 : 1;
return std::make_pair(move_into(StridedFwdIt(src.begin(), stride1), StridedFwdIt(src.end(), stride1)),
move_into(StridedFwdIt(copy.begin(), stride2), StridedFwdIt(copy.end(), stride2)));
}
std::abort(); // would be std::unreachable() if it could
return std::pair<Container, Container>();
}

template <class ValueType, class Container, class Overlap>
struct SetIntersection {
using ContainerType = typename Container::template type<Value<ValueType>>;
size_t size1_;
size_t size2_;

SetIntersection(size_t size1, size_t size2) : size1_(size1), size2_(size2) {}

bool skip() const noexcept {
// let's save some time and skip simmetrical runs
return size1_ < size2_;
}

void run(benchmark::State& state) const {
auto input = genCacheUnfriendlyData<ContainerType>(size1_, size2_, Overlap());
std::vector<Value<ValueType>> out(std::min(size1_, size2_));

const auto BATCH_SIZE = std::max(size_t{512}, (2 * TestSetElements) / (size1_ + size2_));
for (const auto& _ : state) {
while (state.KeepRunningBatch(BATCH_SIZE)) {
for (unsigned i = 0; i < BATCH_SIZE; ++i) {
const auto& [c1, c2] = input;
auto res = std::set_intersection(c1.begin(), c1.end(), c2.begin(), c2.end(), out.begin());
benchmark::DoNotOptimize(res);
}
}
}
}

std::string name() const {
return std::string("SetIntersection") + Overlap::name() + '_' + Container::Name + ValueType::name() + '_' +
std::to_string(size1_) + '_' + std::to_string(size2_);
}
};

} // namespace

int main(int argc, char** argv) { /**/
benchmark::Initialize(&argc, argv);
if (benchmark::ReportUnrecognizedArguments(argc, argv))
return 1;

makeCartesianProductBenchmark<SetIntersection, AllValueTypes, AllContainerTypes, AllOverlapPositions>(
Quantities, Quantities);
benchmark::RunSpecifiedBenchmarks();
return 0;
}
ichaer marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 4 additions & 0 deletions libcxx/docs/ReleaseNotes/19.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ Improvements and New Features
- The ``std::ranges::minmax`` algorithm has been optimized for integral types, resulting in a performance increase of
up to 100x.

- The ``std::set_intersection`` and ``std::ranges::set_intersection`` algorithms have been optimized to fast-forward over
contiguous ranges of non-matching values, reducing the number of comparisons from linear to
logarithmic growth with the number of elements in best-case scenarios.

- The ``_LIBCPP_ENABLE_CXX26_REMOVED_STRSTREAM`` macro has been added to make the declarations in ``<strstream>`` available.

- The ``_LIBCPP_ENABLE_CXX26_REMOVED_WSTRING_CONVERT`` macro has been added to make the declarations in ``<locale>``
Expand Down
54 changes: 54 additions & 0 deletions libcxx/include/__algorithm/iterator_operations.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

#include <__algorithm/iter_swap.h>
#include <__algorithm/ranges_iterator_concept.h>
#include <__assert>
#include <__config>
#include <__iterator/advance.h>
#include <__iterator/distance.h>
Expand Down Expand Up @@ -160,6 +161,59 @@ struct _IterOps<_ClassicAlgPolicy> {
_LIBCPP_HIDE_FROM_ABI static _LIBCPP_CONSTEXPR_SINCE_CXX14 void __advance_to(_Iter& __first, _Iter __last) {
__first = __last;
}

// advance with sentinel, a la std::ranges::advance
template <class _Iter>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 static __difference_type<_Iter>
__advance_to(_Iter& __iter, __difference_type<_Iter> __count, const _Iter& __sentinel) {
return _IterOps::__advance_to(__iter, __count, __sentinel, typename iterator_traits<_Iter>::iterator_category());
}

private:
// advance with sentinel, a la std::ranges::advance -- InputIterator specialization
template <class _InputIter>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 static __difference_type<_InputIter> __advance_to(
_InputIter& __iter, __difference_type<_InputIter> __count, const _InputIter& __sentinel, input_iterator_tag) {
__difference_type<_InputIter> __dist = 0;
for (; __dist < __count && __iter != __sentinel; ++__dist)
++__iter;
return __count - __dist;
}

// advance with sentinel, a la std::ranges::advance -- BidirectionalIterator specialization
template <class _BiDirIter>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 static __difference_type<_BiDirIter>
__advance_to(_BiDirIter& __iter,
__difference_type<_BiDirIter> __count,
const _BiDirIter& __sentinel,
bidirectional_iterator_tag) {
__difference_type<_BiDirIter> __dist = 0;
if (__count >= 0)
for (; __dist < __count && __iter != __sentinel; ++__dist)
++__iter;
else
for (__count = -__count; __dist < __count && __iter != __sentinel; ++__dist)
--__iter;
return __count - __dist;
}

// advance with sentinel, a la std::ranges::advance -- RandomIterator specialization
template <class _RandIter>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 static __difference_type<_RandIter>
__advance_to(_RandIter& __iter,
__difference_type<_RandIter> __count,
const _RandIter& __sentinel,
random_access_iterator_tag) {
auto __dist = _IterOps::distance(__iter, __sentinel);
_LIBCPP_ASSERT_VALID_INPUT_RANGE(
__count == 0 || (__dist < 0) == (__count < 0), "__sentinel must precede __iter when __count < 0");
if (__count < 0)
__dist = __dist > __count ? __dist : __count;
else
__dist = __dist < __count ? __dist : __count;
__iter += __dist;
return __count - __dist;
}
};

_LIBCPP_END_NAMESPACE_STD
Expand Down
54 changes: 49 additions & 5 deletions libcxx/include/__algorithm/lower_bound.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,13 @@

_LIBCPP_BEGIN_NAMESPACE_STD

template <class _AlgPolicy, class _Iter, class _Sent, class _Type, class _Proj, class _Comp>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter
__lower_bound(_Iter __first, _Sent __last, const _Type& __value, _Comp& __comp, _Proj& __proj) {
auto __len = _IterOps<_AlgPolicy>::distance(__first, __last);

template <class _AlgPolicy, class _Iter, class _Type, class _Proj, class _Comp>
_LIBCPP_NODISCARD _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter __lower_bound_bisecting(
_Iter __first,
const _Type& __value,
typename iterator_traits<_Iter>::difference_type __len,
_Comp& __comp,
_Proj& __proj) {
while (__len != 0) {
auto __l2 = std::__half_positive(__len);
_Iter __m = __first;
Expand All @@ -46,6 +48,48 @@ __lower_bound(_Iter __first, _Sent __last, const _Type& __value, _Comp& __comp,
return __first;
}

// One-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general
// advantage of being \Omega(1) rather than the classic algorithm's \Omega(log(n)), with the downside of executing at
// most 2*log(n) comparisons vs the classic algorithm's exact log(n). There are two scenarios in which it really shines:
// the first one is when operating over non-random-access iterators, because the classic algorithm requires knowing the
// container's size upfront, which adds \Omega(n) iterator increments to the complexity. The second one is when you're
// traversing the container in order, trying to fast-forward to the next value: in that case, the classic algorithm
// would yield \Omega(n*log(n)) comparisons and, for non-random-access iterators, \Omega(n^2) iterator increments,
// whereas the one-sided version will yield O(n) operations on both counts, with a \Omega(log(n)) bound on the number of
// comparisons.
template <class _AlgPolicy, class _ForwardIterator, class _Sent, class _Type, class _Proj, class _Comp>
_LIBCPP_NODISCARD _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _ForwardIterator
__lower_bound_onesided(_ForwardIterator __first, _Sent __last, const _Type& __value, _Comp& __comp, _Proj& __proj) {
// step = 0, ensuring we can always short-circuit when distance is 1 later on
if (__first == __last || !std::__invoke(__comp, std::__invoke(__proj, *__first), __value))
return __first;

using _Distance = typename iterator_traits<_ForwardIterator>::difference_type;
for (_Distance __step = 1; __first != __last; __step <<= 1) {
auto __it = __first;
auto __dist = __step - _IterOps<_AlgPolicy>::__advance_to(__it, __step, __last);
// once we reach the last range where needle can be we must start
// looking inwards, bisecting that range
if (__it == __last || !std::__invoke(__comp, std::__invoke(__proj, *__it), __value)) {
// we've already checked the previous value and it was less, we can save
// one comparison by skipping bisection
if (__dist == 1)
return __it;
return std::__lower_bound_bisecting<_AlgPolicy>(__first, __value, __dist, __comp, __proj);
}
// range not found, move forward!
__first = __it;
}
return __first;
}

template <class _AlgPolicy, class _ForwardIterator, class _Sent, class _Type, class _Proj, class _Comp>
_LIBCPP_NODISCARD inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _ForwardIterator
__lower_bound(_ForwardIterator __first, _Sent __last, const _Type& __value, _Comp& __comp, _Proj& __proj) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be possible to do something like (pseudo):

template <class _Iterator, class _Sent, class _Tp, class _Compare, class _Projection>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iterator
__lower_bound(_Iterator __first, _Sent __sent, const _Tp& __value, _Compare __comp, _Projection __proj) {
  if constexpr (_Iterator is not random access &&
                _Tp is a fundamental type &&
                _Compare desugars to a builtin operation &&
                __is_identity<_Projection>) {
    use-one-sided-lower-bound;
  } else {
    use-classic-lower-bound;
  }
}

@philnik777 Do you have thoughts on this? I wonder if we should consider that it's OK (under the as-if rule) to change the complexity of an algorithm with respect to a certain operation if that change cannot be observed other than by e.g. timing the algorithm. At first glance, I would say this is a very pedantic and not especially useful way of constraining ourselves, but I'm curious to hear what you think about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly improve the complexity here (http://eel.is/c++draft/structure#specifications-7). I do wonder whether this is also the case for the "exactly N times" specifications, but I don't think it's in the interest of anybody to force implementations to add overhead for conformance. Making the complexity worse (by more than a constant factor) would be another story. We already break the complexity requirements by a constant factor if it's not observable, e.g. in ranges::minmax with integers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also my thinking.

@ichaer After this patch has landed, you could consider a patch like the above if you want -- I think we'd take it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome =D, thanks!

const auto __dist = _IterOps<_AlgPolicy>::distance(__first, __last);
return std::__lower_bound_bisecting<_AlgPolicy>(__first, __value, __dist, __comp, __proj);
}

template <class _ForwardIterator, class _Tp, class _Compare>
_LIBCPP_NODISCARD inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _ForwardIterator
lower_bound(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value, _Compare __comp) {
Expand Down
Loading
Loading