-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Speed up set_intersection() by fast-forwarding over ranges of non-matching elements with one-sided binary search. #75230
Conversation
…ntroducing one-sided binary search for non-random iterators.
…dom iterators. One-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general advantage of being Ω(1) rather than the classic algorithm's Ω(log(n)), with the downside of executing at most 2*log(n) comparisons vs the classic algorithm's exact log(n). There are two scenarios in which it really shines: the first one is when operating over non-random iterators, because the classic algorithm requires knowing the container's size upfront, which adds Ω(n) iterator increments to the complexity. The second one is when you're traversing the container in order, trying to fast-forward to the next value: in that case, the classic algorithm would yield Ω(n*log(n)) comparisons and, for non-random iterators, Ω(n^2) iterator increments, whereas the one-sided version will yield O(n) operations on both counts, with a Ω(log(n)) bound on the number of comparisons.
… to introducing use of one-sided binary search to fast-forward over ranges of elements.
…omplexity of set_intersection.
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-libcxx Author: None (ichaer) ChangesOne-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general advantage of being Ω(1) rather than the classic algorithm's Ω(log(n)), with the downside of executing at most 2log(n) comparisons vs the classic algorithm's exact log(n). There are two scenarios in which it really shines: the first one is when operating over non-random iterators, because the classic algorithm requires knowing the container's size upfront, which adds Ω(n) iterator increments to the complexity. The second one is when you're traversing the container in order, trying to fast-forward to the next value: in that case, the classic algorithm would yield Ω(nlog(n)) comparisons and, for non-random iterators, Ω(n^2) iterator increments, whereas the one-sided version will yield O(n) operations on both counts, with a Ω(log(n)) bound on the number of comparisons. One use case that can often fit both scenarios is a std::set_intersection() on std::set instances. This change is split into 4 separate commits. First we add tests validating the algorithmic complexity of std::lower_bound(), then we introduce the use of one-sided binary search for non-random iterators, then we add complexity validation tests for std::set_intersection(), and, finally, we introduce explicit use of the one-sided binary search version of std::lower_bound() for the instances of std::set_intersection() where there is no question about it improving time complexity. Patch is 34.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75230.diff 7 Files Affected:
diff --git a/libcxx/include/__algorithm/iterator_operations.h b/libcxx/include/__algorithm/iterator_operations.h
index e6176da4f5606..d73573747087e 100644
--- a/libcxx/include/__algorithm/iterator_operations.h
+++ b/libcxx/include/__algorithm/iterator_operations.h
@@ -87,6 +87,53 @@ struct _IterOps<_ClassicAlgPolicy> {
std::advance(__iter, __count);
}
+ // advance with sentinel, a la std::ranges::advance
+ // it's unclear whether _Iter has a difference_type and whether that's signed, so we play it safe:
+ // use the incoming type for returning and steer clear of negative overflows
+ template <class _Iter, class _Distance>
+ _LIBCPP_HIDE_FROM_ABI constexpr static _Distance advance(_Iter& __iter, _Distance __count, const _Iter& __sentinel) {
+ return _IterOps::__advance(__iter, __count, __sentinel, typename iterator_traits<_Iter>::iterator_category());
+ }
+
+ // advance with sentinel, a la std::ranges::advance -- InputIterator specialization
+ template <class _InputIter, class _Distance>
+ _LIBCPP_HIDE_FROM_ABI constexpr static _Distance
+ __advance(_InputIter& __iter, _Distance __count, const _InputIter& __sentinel, input_iterator_tag) {
+ _Distance __dist{};
+ for (; __dist < __count && __iter != __sentinel; ++__dist)
+ ++__iter;
+ return __count - __dist;
+ }
+
+ // advance with sentinel, a la std::ranges::advance -- BidirectionalIterator specialization
+ template <class _BiDirIter, class _Distance>
+ _LIBCPP_HIDE_FROM_ABI constexpr static _Distance
+ __advance(_BiDirIter& __iter, _Distance __count, const _BiDirIter& __sentinel, bidirectional_iterator_tag) {
+ _Distance __dist{};
+ if (__count >= 0)
+ for (; __dist < __count && __iter != __sentinel; ++__dist)
+ ++__iter;
+ else
+ for (__count = -__count; __dist < __count && __iter != __sentinel; ++__dist)
+ --__iter;
+ return __count - __dist;
+ }
+
+ // advance with sentinel, a la std::ranges::advance -- RandomIterator specialization
+ template <class _RandIter, class _Distance>
+ _LIBCPP_HIDE_FROM_ABI constexpr static _Distance
+ __advance(_RandIter& __iter, _Distance __count, const _RandIter& __sentinel, random_access_iterator_tag) {
+ auto __dist = _IterOps::distance(__iter, __sentinel);
+ _LIBCPP_ASSERT_UNCATEGORIZED(
+ __count == 0 || (__dist < 0) == (__count < 0), "__sentinel must precede __iter when __count<0");
+ if (__count < 0)
+ __dist = __dist > __count ? __dist : __count;
+ else
+ __dist = __dist < __count ? __dist : __count;
+ __iter += __dist;
+ return __count - __dist;
+ }
+
// distance
template <class _Iter>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14
diff --git a/libcxx/include/__algorithm/lower_bound.h b/libcxx/include/__algorithm/lower_bound.h
index 91c3bdaafd0cf..b432829667fa9 100644
--- a/libcxx/include/__algorithm/lower_bound.h
+++ b/libcxx/include/__algorithm/lower_bound.h
@@ -27,11 +27,13 @@
_LIBCPP_BEGIN_NAMESPACE_STD
-template <class _AlgPolicy, class _Iter, class _Sent, class _Type, class _Proj, class _Comp>
-_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
-_Iter __lower_bound(_Iter __first, _Sent __last, const _Type& __value, _Comp& __comp, _Proj& __proj) {
- auto __len = _IterOps<_AlgPolicy>::distance(__first, __last);
-
+template <class _AlgPolicy, class _Iter, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter __lower_bound_bisecting(
+ _Iter __first,
+ const _Type& __value,
+ typename iterator_traits<_Iter>::difference_type __len,
+ _Comp& __comp,
+ _Proj& __proj) {
while (__len != 0) {
auto __l2 = std::__half_positive(__len);
_Iter __m = __first;
@@ -46,13 +48,68 @@ _Iter __lower_bound(_Iter __first, _Sent __last, const _Type& __value, _Comp& __
return __first;
}
+// One-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general
+// advantage of being Ω(1) rather than the classic algorithm's Ω(log(n)), with the downside of executing at most
+// 2*(log(n)-1) comparisons vs the classic algorithm's exact log(n). There are two scenarios in which it really shines:
+// the first one is when operating over non-random iterators, because the classic algorithm requires knowing the
+// container's size upfront, which adds Ω(n) iterator increments to the complexity. The second one is when you're
+// traversing the container in order, trying to fast-forward to the next value: in that case, the classic algorithm
+// would yield Ω(n*log(n)) comparisons and, for non-random iterators, Ω(n^2) iterator increments, whereas the one-sided
+// version will yield O(n) operations on both counts, with a Ω(log(n)) bound on the number of comparisons.
+template <class _AlgPolicy, class _Iter, class _Sent, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter
+__lower_bound_onesided(_Iter __first, _Sent __last, const _Type& __value, _Comp& __comp, _Proj& __proj) {
+ // static_assert(std::is_base_of<std::forward_iterator_tag, typename _IterOps<_AlgPolicy>::template
+ // __iterator_category<_Iter>>::value,
+ // "lower_bound() is a multipass algorithm and requires forward iterator or better");
+
+ using _Distance = typename iterator_traits<_Iter>::difference_type;
+ for (_Distance __step = 1; __first != __last; __step <<= 1) {
+ auto __it = __first;
+ auto __dist = __step - _IterOps<_AlgPolicy>::advance(__it, __step, __last);
+ // once we reach the last range where needle can be we must start
+ // looking inwards, bisecting that range
+ if (__it == __last || !std::__invoke(__comp, std::__invoke(__proj, *__it), __value)) {
+ return std::__lower_bound_bisecting<_AlgPolicy>(__first, __value, __dist, __comp, __proj);
+ }
+ // range not found, move forward!
+ __first = std::move(__it);
+ }
+ return __first;
+}
+
+template <class _AlgPolicy, class _InputIter, class _Sent, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _InputIter __lower_bound(
+ _InputIter __first, _Sent __last, const _Type& __value, _Comp& __comp, _Proj& __proj, std::input_iterator_tag) {
+ return std::__lower_bound_onesided<_AlgPolicy>(__first, __last, __value, __comp, __proj);
+}
+
+template <class _AlgPolicy, class _RandIter, class _Sent, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _RandIter __lower_bound(
+ _RandIter __first,
+ _Sent __last,
+ const _Type& __value,
+ _Comp& __comp,
+ _Proj& __proj,
+ std::random_access_iterator_tag) {
+ const auto __dist = _IterOps<_AlgPolicy>::distance(__first, __last);
+ return std::__lower_bound_bisecting<_AlgPolicy>(__first, __value, __dist, __comp, __proj);
+}
+
+template <class _AlgPolicy, class _Iter, class _Sent, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter
+__lower_bound(_Iter __first, _Sent __last, const _Type& __value, _Comp&& __comp, _Proj&& __proj) {
+ return std::__lower_bound<_AlgPolicy>(
+ __first, __last, __value, __comp, __proj, typename _IterOps<_AlgPolicy>::template __iterator_category<_Iter>());
+}
+
template <class _ForwardIterator, class _Tp, class _Compare>
_LIBCPP_NODISCARD_EXT inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
_ForwardIterator lower_bound(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value, _Compare __comp) {
static_assert(__is_callable<_Compare, decltype(*__first), const _Tp&>::value,
"The comparator has to be callable");
auto __proj = std::__identity();
- return std::__lower_bound<_ClassicAlgPolicy>(__first, __last, __value, __comp, __proj);
+ return std::__lower_bound<_ClassicAlgPolicy>(__first, __last, __value, std::move(__comp), std::move(__proj));
}
template <class _ForwardIterator, class _Tp>
diff --git a/libcxx/include/__algorithm/set_intersection.h b/libcxx/include/__algorithm/set_intersection.h
index f2603fe1365ac..556738022f485 100644
--- a/libcxx/include/__algorithm/set_intersection.h
+++ b/libcxx/include/__algorithm/set_intersection.h
@@ -12,9 +12,13 @@
#include <__algorithm/comp.h>
#include <__algorithm/comp_ref_type.h>
#include <__algorithm/iterator_operations.h>
+#include <__algorithm/lower_bound.h>
#include <__config>
+#include <__functional/identity.h>
#include <__iterator/iterator_traits.h>
#include <__iterator/next.h>
+#include <__type_traits/is_same.h>
+#include <__utility/exchange.h>
#include <__utility/move.h>
#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
@@ -36,9 +40,122 @@ struct __set_intersection_result {
};
template <class _AlgPolicy, class _Compare, class _InIter1, class _Sent1, class _InIter2, class _Sent2, class _OutIter>
-_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersection_result<_InIter1, _InIter2, _OutIter>
-__set_intersection(
- _InIter1 __first1, _Sent1 __last1, _InIter2 __first2, _Sent2 __last2, _OutIter __result, _Compare&& __comp) {
+struct _LIBCPP_NODISCARD_EXT __set_intersector {
+ _InIter1& __first1_;
+ const _Sent1& __last1_;
+ _InIter2& __first2_;
+ const _Sent2& __last2_;
+ _OutIter& __result_;
+ _Compare& __comp_;
+ static constexpr auto __proj_ = std::__identity();
+ bool __prev_advanced_ = true;
+
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersector(
+ _InIter1& __first1, _Sent1& __last1, _InIter2& __first2, _Sent2& __last2, _OutIter& __result, _Compare& __comp)
+ : __first1_(__first1),
+ __last1_(__last1),
+ __first2_(__first2),
+ __last2_(__last2),
+ __result_(__result),
+ __comp_(__comp) {}
+
+ _LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI
+ _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersection_result<_InIter1, _InIter2, _OutIter>
+ operator()() && {
+ while (__first2_ != __last2_) {
+ __advance1_and_maybe_add_result();
+ if (__first1_ == __last1_)
+ break;
+ __advance2_and_maybe_add_result();
+ }
+ return __set_intersection_result<_InIter1, _InIter2, _OutIter>(
+ _IterOps<_AlgPolicy>::next(std::move(__first1_), std::move(__last1_)),
+ _IterOps<_AlgPolicy>::next(std::move(__first2_), std::move(__last2_)),
+ std::move(__result_));
+ }
+
+private:
+ // advance __iter to the first element in the range where !__comp_(__iter, __value)
+ // add result if this is the second consecutive call without advancing
+ // this method only works if you alternate calls between __advance1_and_maybe_add_result() and
+ // __advance2_and_maybe_add_result()
+ template <class _Iter, class _Sent, class _Value>
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
+ __advance_and_maybe_add_result(_Iter& __iter, const _Sent& __sentinel, const _Value& __value) {
+ // use one-sided lower bound for improved algorithmic complexity bounds
+ const auto __tmp =
+ std::exchange(__iter, std::__lower_bound_onesided<_AlgPolicy>(__iter, __sentinel, __value, __comp_, __proj_));
+ __add_output_unless(__tmp != __iter);
+ }
+
+ // advance __first1_ to the first element in the range where !__comp_(*__first1_, *__first2_)
+ // add result if neither __first1_ nor __first2_ advanced in the last attempt (meaning they are equal)
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __advance1_and_maybe_add_result() {
+ __advance_and_maybe_add_result(__first1_, __last1_, *__first2_);
+ }
+
+ // advance __first2_ to the first element in the range where !__comp_(*__first2_, *__first1_)
+ // add result if neither __first1_ nor __first2_ advanced in the last attempt (meaning they are equal)
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __advance2_and_maybe_add_result() {
+ __advance_and_maybe_add_result(__first2_, __last2_, *__first1_);
+ }
+
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __add_output_unless(bool __advanced) {
+ if (__advanced | __prev_advanced_) {
+ __prev_advanced_ = __advanced;
+ } else {
+ *__result_ = *__first1_;
+ ++__result_;
+ ++__first1_;
+ ++__first2_;
+ __prev_advanced_ = true;
+ }
+ }
+};
+
+// with forward iterators we can use binary search to skip over entries
+template <class _AlgPolicy,
+ class _Compare,
+ class _InForwardIter1,
+ class _Sent1,
+ class _InForwardIter2,
+ class _Sent2,
+ class _OutIter>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI
+ _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersection_result<_InForwardIter1, _InForwardIter2, _OutIter>
+ __set_intersection(
+ _InForwardIter1 __first1,
+ _Sent1 __last1,
+ _InForwardIter2 __first2,
+ _Sent2 __last2,
+ _OutIter __result,
+ _Compare&& __comp,
+ std::forward_iterator_tag,
+ std::forward_iterator_tag) {
+ std::__set_intersector<_AlgPolicy, _Compare, _InForwardIter1, _Sent1, _InForwardIter2, _Sent2, _OutIter>
+ __intersector(__first1, __last1, __first2, __last2, __result, __comp);
+ return std::move(__intersector)();
+}
+
+// input iterators are not suitable for multipass algorithms, so we stick to the classic single-pass version
+template <class _AlgPolicy,
+ class _Compare,
+ class _InInputIter1,
+ class _Sent1,
+ class _InInputIter2,
+ class _Sent2,
+ class _OutIter>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI
+ _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersection_result<_InInputIter1, _InInputIter2, _OutIter>
+ __set_intersection(
+ _InInputIter1 __first1,
+ _Sent1 __last1,
+ _InInputIter2 __first2,
+ _Sent2 __last2,
+ _OutIter __result,
+ _Compare&& __comp,
+ std::input_iterator_tag,
+ std::input_iterator_tag) {
while (__first1 != __last1 && __first2 != __last2) {
if (__comp(*__first1, *__first2))
++__first1;
@@ -52,12 +169,41 @@ __set_intersection(
}
}
- return __set_intersection_result<_InIter1, _InIter2, _OutIter>(
+ return std::__set_intersection_result<_InInputIter1, _InInputIter2, _OutIter>(
_IterOps<_AlgPolicy>::next(std::move(__first1), std::move(__last1)),
_IterOps<_AlgPolicy>::next(std::move(__first2), std::move(__last2)),
std::move(__result));
}
+template <class _AlgPolicy, class _Iter>
+class __set_intersection_iter_category {
+ template <class _It>
+ using __cat = typename std::_IterOps<_AlgPolicy>::template __iterator_category<_It>;
+ template <class _It>
+ static auto test(__cat<_It>*) -> __cat<_It>;
+ template <class>
+ static std::input_iterator_tag test(...);
+
+public:
+ using __type = decltype(test<_Iter>(nullptr));
+};
+
+template <class _AlgPolicy, class _Compare, class _InIter1, class _Sent1, class _InIter2, class _Sent2, class _OutIter>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI
+ _LIBCPP_CONSTEXPR_SINCE_CXX20 __set_intersection_result<_InIter1, _InIter2, _OutIter>
+ __set_intersection(
+ _InIter1 __first1, _Sent1 __last1, _InIter2 __first2, _Sent2 __last2, _OutIter __result, _Compare&& __comp) {
+ return std::__set_intersection<_AlgPolicy>(
+ std::move(__first1),
+ std::move(__last1),
+ std::move(__first2),
+ std::move(__last2),
+ std::move(__result),
+ std::forward<_Compare>(__comp),
+ typename std::__set_intersection_iter_category<_AlgPolicy, _InIter1>::__type(),
+ typename std::__set_intersection_iter_category<_AlgPolicy, _InIter2>::__type());
+}
+
template <class _InputIterator1, class _InputIterator2, class _OutputIterator, class _Compare>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _OutputIterator set_intersection(
_InputIterator1 __first1,
diff --git a/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound.pass.cpp b/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound.pass.cpp
index a2d8ab632303c..5c11962d13777 100644
--- a/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound.pass.cpp
@@ -39,11 +39,20 @@ template <class Iter, class T>
void
test(Iter first, Iter last, const T& value)
{
- Iter i = std::lower_bound(first, last, value);
- for (Iter j = first; j != i; ++j)
- assert(*j < value);
- for (Iter j = i; j != last; ++j)
- assert(!(*j < value));
+ std::size_t strides{};
+ std::size_t displacement{};
+ stride_counting_iterator f(first, &strides, &displacement);
+ stride_counting_iterator l(last, &strides, &displacement);
+
+ auto i = std::lower_bound(f, l, value);
+ for (auto j = f; j != i; ++j)
+ assert(*j < value);
+ for (auto j = i; j != l; ++j)
+ assert(!(*j < value));
+
+ auto len = std::distance(first, last);
+ assert(strides <= 2.5 * len + 1);
+ assert(displacement <= 2.5 * len + 1);
}
template <class Iter>
diff --git a/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound_comp.pass.cpp b/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound_comp.pass.cpp
index b9133028d9ade..05fd43eada461 100644
--- a/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound_comp.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound_comp.pass.cpp
@@ -17,6 +17,7 @@
#include <vector>
#include <cassert>
#include <cstddef>
+#include <cmath>
#include "test_macros.h"
#include "test_iterators.h"
@@ -38,11 +39,28 @@ template <class Iter, class T>
void
test(Iter first, Iter last, const T& value)
{
- Iter i = std::lower_bound(first, last, value, std::greater<int>());
- for (Iter j = first; j != i; ++j)
- assert(std::greater<int>()(*j, value));
- for (Iter j = i; j != last; ++j)
- assert(!std::greater<int>()(*j, value));
+ std::size_t strides{};
+ std::size_t displacement{};
+ stride_counting_iterator f(first, &strides, &displacement);
+ stride_counting_iterator l(last, &strides, &displacement);
+
+ std::size_t comparisons{};
+ auto cmp = [&comparisons](int rhs, int lhs) {
+ ++comparisons;
+ return std::greater<int>()(rhs, lhs);
+ };
+
+ auto i = std::lower_bound(f, l, value, cmp);
+
+ for (auto j = f; j != i; ++j)
+ assert(std::greater<int>()(*j, value));
+ for (auto j = i; j != l; ++j)
+ assert(!std::greater<int>()(*j, value));
+
+ auto len = std::distance(first, last);
+ assert(strides <= 2.5 * len + 1);
+ assert(displacement <= 2.5 * len + 1);
+ assert(comparisons <= 2 * ceil(log(len + 1) + 2));
}
template <class Iter>
diff --git a/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp b/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
index 0ee89e0131a07..30cedd19038d7 100644
--- a/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
@@ -28,6 +28,9 @@
#include <algorithm>
#include <array>
#include <concepts>
+#include <cstddef>
+#include <iterator>
+#include <type_traits>
#include "almost_satisfies_types.h"
#include "MoveOnly.h"
@@ -93,14 +96,17 @@ static_assert(!HasSetIntersectionRange<UncheckedRange<MoveOnly*>, UncheckedRange
using std::ranges::set_intersection_result;
+// TODO: std::ranges::set_intersection calls std::ranges::copy
+// std::ranges::copy(contiguous_iterator<int*>, sentinel_wrapper<contiguous_iterator<int*>>, contiguous_iterator<int*>) doesn't seem to work.
+// It seems that std::ranges::copy calls std::copy, which unwraps contiguous_iterator<int*> into int*,
+// and then it failed because there is no == between int* and sentinel_wrapper<contiguous_iterator<int*>>
+template <typename Iter>
+using SentinelWorkaround = std::conditional_t<std::contiguous_iterator<Iter>, Iter, sentinel_wrapper<Iter>>;
+
template <class In1, class In2, class Out, std::size_t N1, std::size_t N2, std::size_t N3>
constexpr void testSetIntersectionImpl(std::array<int, N1> in1, std::array<int, N2> in2, std::array<int, N3> expected) {
- ...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Thanks for the patch. If I understand it right this might violate the complexity requirements in http://eel.is/c++draft/lower.bound#4. Is that correct? |
Hi Mark, thanks for your reply! Strictly speaking, yes, that is true: the number of comparisons may violate the requirement of there being at most log(last - first) + O(1) projections and comparisons. However, that is only being done for non-random access iterators, where we are also executing 2*(last-first) iterator mutations, which are not part of the explicit algorithm guarantees, and are the reason why std::set and std::map have their own lower_bound -- std::lower_bound is really hard to use on non-random iterators because of this additional cost. Would this be a blocker for accepting the contribution? |
Yes. We have to stay standards-conforming - libc++ is an implementation of the C++ standard after all, so even if the trade-off seems like a good idea we can't make the call in this case. If this improves the performance for trivial calls like an integer and |
OK, that's fair. How about I just stop exposing one-sided lower bound through std::lower_bound, and just use it for std::set_intersection, where all complexity guarantees are preserved, would that be OK? |
If all the complexity guarantees are met, that would be OK. We always like to see some benchmarks to have some data that the optimization actually improves performance. |
SGTM! I'll amend my submission, thanks! |
Great. As @philnik777 said we need to be careful about observable behaviour. I love to see performance improvements :-) |
Cool, thanks :). Just for context, this code is actually being used in production out there. I work for Splunk, and we've implemented this outside of the standard library to speed up a component that does a number of set intersections, overlaying containers on top of each other (it's more complex, but that's the relevant part in a nutshell). The huge majority of the sets involved contain a tiny subset of what the target contains, so this makes a huge difference, and we thought we should contribute upstream. Obviously I won't ask you to take me on my word :). Code is truth, I'll spend a bit of time understanding existing benchmarks in the LLVM repo to include something relevant in the pull request. |
I don't think you're wrong that this improves performance, but being able to point to hard data is always great. With algorithms it's almost always really easy to write micro benchmarks, so there we really like to have them. |
Sounds good to me! I'll be out on holiday for a couple of weeks and should be able to update this PR in early January. Thanks for your feedback so far, I appreciate it :). |
…elf since that violates the complexity guarantees from the standard.
… it down for faster runs, and including comparison counter.
libcxx/test/std/algorithms/alg.sorting/alg.binary.search/lower.bound/lower_bound.pass.cpp
Outdated
Show resolved
Hide resolved
…)` must start with `__step==0`, otherwise we can't match the complexity of linear search when continually matching (like a std::set_intersection() of matching containers will).
.../algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
Outdated
Show resolved
Hide resolved
.../algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
Outdated
Show resolved
Hide resolved
.../algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
Outdated
Show resolved
Hide resolved
Also let's add a release note for this in |
@ldionne's latest inline review suggestions. Co-authored-by: Louis Dionne <[email protected]>
… to its pre-existing sibling and remove leftover comments.
…unts/counted_set_intersection/', and split complexity tests into `set_intersection_complexity.pass.cpp`.
… `std::set_intersection()`
…rantees for a single match over an input of 1 to 20 elements.
…dening mode inside `testComplexityBasic()` as well.
…ething better after some more thinking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically LGTM but I have a few small comments left.
_InForwardIter1 __first1_next = | ||
std::__lower_bound_onesided<_AlgPolicy>(__first1, __last1, *__first2, __comp, __proj); | ||
std::swap(__first1_next, __first1); | ||
std::__set_intersection_add_output_unless(__first1 != __first1_next, __first1, __first2, __result, __prev_advanced); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the algorithm would be easier to read if we inlined this function at its two call sites. Why do we need to keep track of whether we previously advanced the iterators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe I should have documented that! It's a (hopefully :)) clever optimisation to preserve the number of comparisons. In set_intersection()
we compute a==b
by means of !(a<b || b<a)
, right? We can avoid further comparison operator calls by checking whether we've advanced any side. If, at any time, we have two subsequent attempts to advance iterators failing, then we know elements are equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Maybe simply calling it an optimisation is unfair. This is what allows us to preserve the complexity guarantees from the current standard.)
std::forward_iterator_tag, | ||
std::forward_iterator_tag) { | ||
_LIBCPP_CONSTEXPR std::__identity __proj; | ||
bool __prev_advanced = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels off that we start with __prev_advanced = true
. I'm not saying the algorithm's wrong, but is it possible that another name for this variable is trying to emerge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. Would you be happier with something like __maybe_equals
? I'd have to invert the logic, but maybe that would help make it more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might help, yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I've renamed the variables and added a comment at the point of call which I'm hoping captures what I said here :). Let me know what you think, and whether you'd still prefer for the code inside __set_intersection_add_output_if_equal()
to be inlined. My personal feeling about it is that the function wraps a fiddly idea, and that it's easier for me to unpack the algorithm when I give that idea a name, but I defer to you.
* s/__set_intersection_add_output_unless/__set_intersection_add_output_if_equal/ * Add comments to explain the logic for equality comparison
* llvm/main: (8718 commits) [LLVM] Add `llvm.experimental.vector.compress` intrinsic (llvm#92289) [Clang] [C23] Implement N2653: u8 strings are char8_t[] (llvm#97208) [SLP][REVEC] Make SLP support revectorization (-slp-revec) and add simple test. (llvm#98269) [Flang] Exclude the reference to TIME_UTC for AIX. (llvm#99069) [clang][Sema] Improve `Sema::CheckCXXDefaultArguments` (llvm#97338) adjust the Xtensa backend after change f270a4d [lldb][Bazel]: Second attempt to adapt for a751f65 Revert "[lldb][Bazel]: Adapt BUILD.bazel file for a751f65" [lldb][Bazel]: Adapt BUILD.bazel file for a751f65 [emacs] Fix autoloading for llvm-mir-mode (llvm#98984) [SLP]Improve minbitwidth analysis for trun'ed gather nodes. [Clang][Concepts] Avoid substituting into constraints for invalid TemplateDecls (llvm#75697) [LV][NFC]Introduce isScalableVectorizationAllowed() to refactor getMaxLegalScalableVF(). Revert "[AArch64] Remove superfluous sxtw in peephole opt (llvm#96293)" [AArch64][GISel] Add test cases for folding shifts into load/store addressing modes (NFC) [LV] Process dead interleave pointer ops in reverse order. [AMDGPU] Remove SIWholeQuadMode pass early exit (llvm#98450) [InstrRef][NFC] Avoid un-necessary DenseMap queries (llvm#99048) [gn build] Port e94e72a [LV][AArch64] Prefer Fixed over Scalable if cost-model is equal (Neoverse V2) (llvm#95819) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ichaer I personally find the helper unhelpful in understanding the algorithm, but I'm OK with merging as-is. Will merge once CI is green.
Thanks for the patch and for all the iterations you did on this.
Thank you, @ldionne :). |
…of non-matching elements with one-sided binary search. (llvm#75230) One-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general advantage of being constant time in the best case, with the downside of executing at most 2*log(N) comparisons vs classic binary search's exact log(N). There are two scenarios in which it really shines: the first one is when operating over non-random-access iterators, because the classic algorithm requires knowing the container's size upfront, which adds N iterator increments to the complexity. The second one is when traversing the container in order, trying to fast-forward to the next value: in that case the classic algorithm requires at least O(N*log(N)) comparisons and, for non-random-access iterators, O(N^2) iterator increments, whereas the one-sided version will yield O(N) operations on both counts, with a best-case of O(log(N)) comparisons which is very common in practice.
…of non-matching elements with one-sided binary search. (#75230) One-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general advantage of being constant time in the best case, with the downside of executing at most 2*log(N) comparisons vs classic binary search's exact log(N). There are two scenarios in which it really shines: the first one is when operating over non-random-access iterators, because the classic algorithm requires knowing the container's size upfront, which adds N iterator increments to the complexity. The second one is when traversing the container in order, trying to fast-forward to the next value: in that case the classic algorithm requires at least O(N*log(N)) comparisons and, for non-random-access iterators, O(N^2) iterator increments, whereas the one-sided version will yield O(N) operations on both counts, with a best-case of O(log(N)) comparisons which is very common in practice.
Thanks for the nice optimization! However, I'd like to point out one real-world complication it brings: it breaks some invalid usages of I wonder, if it would be possible to add runtime checks in one of the hardening modes to assist with finding this sort of an issue? For example, a check for the correct ordering of the bounds could be made on each step of the binary search. This should at least retain the complexity. A more comprehensive check (ordering of all elements in both input sequences) could be done in An example of a problematic use: https://gcc.godbolt.org/z/Y5dxsxzbe. |
@alexfh, I did think of that when proposing the change, but I couldn't think of a good way to help users. Your idea of adding validation to the hardening modes sounds good to me, I've got my hands full right now, but I'll open a PR for that as soon as I can. The case you shared, with Edit: I was going to say that I think that the cheap check is too weak to be valuable, but I'm really torn on this... We do have a full linear scan in |
I didn't think too much about tradeoffs here, but it seems like a full linear scan in As for a cheaper check for |
@alexfh Can this result in any security-concerning bugs like out-of-bounds reads? If not I'd just go for a |
No out-of-bounds reads, we still respect the end sentinel. What happens is that we now skip over potential matches, and if the sequence is unsorted we may not go back to see what we skipped over. |
The use of one-sided binary search introduced by a066217 changes behaviour on invalid, unsorted input (see llvm#75230 (comment)). Add input validation on `_LIBCPP_HARDENING_MODE_DEBUG` to help users. * Change interface of `__is_sorted_until()` so that it accepts a sentinel that's of a different type than the beginning iterator, and to ensure it won't try to copy the comparison function object. * Add one assertion for each input range confirming that they are sorted. * Stop validating complexity of `set_intersection()` in debug mode, it's hopeless and also not meaningful: there are no complexity guarantees in debug mode, we're happy to trade performance for diagnosability. * Fix bugs in `ranges_robust_against_differing_projections.pass`: we were using an input range as output for `std::ranges::partial_sort_copy()`, and using projections which return the opposite value means that algorithms requiring a sorted range can't use ranges sorted with ascending values if the comparator is `std::ranges::less`. Added `const` where appropriate to make sure we weren't using inputs as outputs in other places.
@alexfh, @philnik777: I've opened #101508 with what we discussed. |
One-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general advantage of being constant time in the best case, with the downside of executing at most 2*log(N) comparisons vs classic binary search's exact log(N). There are two scenarios in which it really shines: the first one is when operating over non-random-access iterators, because the classic algorithm requires knowing the container's size upfront, which adds N iterator increments to the complexity. The second one is when traversing the container in order, trying to fast-forward to the next value: in that case the classic algorithm requires at least O(N*log(N)) comparisons and, for non-random-access iterators, O(N^2) iterator increments, whereas the one-sided version will yield O(N) operations on both counts, with a best-case of O(log(N)) comparisons which is very common in practice.
Benchmark results for libcxx/benchmarks/set_intersection.libcxx.out on the following system:
Run on (16 X 4800.01 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 1280 KiB (x8)
L3 Unified 24576 KiB (x1)
Comparison between main branch at b926f75 and this branch at be6c5c8, with a fixed seed of 0 hardcoded in
getRandomEngine()
for consistent and reproducible results -- the full output has 1.3k lines, so I'm only pasting here the lines with best and worst CPU time differences, plus the summary: