-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Separate OpenMP Offloading Backend to libcxx/include/__algorithm/pstl_backends
#66968
base: main
Are you sure you want to change the base?
Adding Separate OpenMP Offloading Backend to libcxx/include/__algorithm/pstl_backends
#66968
Conversation
@llvm/pr-subscribers-github-workflow @llvm/pr-subscribers-libcxx ChangesTo address some of the good feedback from the recent request for comments, I have added a separate OpenMP offloading backend in To begin with, I have only added I know there are still a lot of things missing, such as LIT tests, but I think we should try to agree on the project structure first. Full diff: https://github.com/llvm/llvm-project/pull/66968.diff 9 Files Affected:
diff --git a/libcxx/CMakeLists.txt b/libcxx/CMakeLists.txt
index bb2898b799bcef9..43d2a448de79584 100644
--- a/libcxx/CMakeLists.txt
+++ b/libcxx/CMakeLists.txt
@@ -290,6 +290,8 @@ option(LIBCXX_HAS_WIN32_THREAD_API "Ignore auto-detection and force use of win32
option(LIBCXX_HAS_EXTERNAL_THREAD_API
"Build libc++ with an externalized threading API.
This option may only be set to ON when LIBCXX_ENABLE_THREADS=ON." OFF)
+option(LIBCXX_ENABLE_GPU_OFFLOAD
+ "Build libc++ with support for GPU offload" OFF)
if (LIBCXX_ENABLE_THREADS)
set(LIBCXX_PSTL_CPU_BACKEND "std_thread" CACHE STRING "Which PSTL CPU backend to use")
@@ -297,6 +299,14 @@ else()
set(LIBCXX_PSTL_CPU_BACKEND "serial" CACHE STRING "Which PSTL CPU backend to use")
endif()
+if (NOT DEFINED LIBCXX_PSTL_GPU_BACKEND)
+ if (${LIBCXX_ENABLE_GPU_OFFLOAD})
+ set(LIBCXX_PSTL_GPU_BACKEND "omp_offload" CACHE STRING "Which PSTL GPU backend to use")
+ else()
+ set(LIBCXX_PSTL_GPU_BACKEND "none" CACHE STRING "Which PSTL GPU backend to use")
+ endif()
+endif()
+
# Misc options ----------------------------------------------------------------
# FIXME: Turn -pedantic back ON. It is currently off because it warns
# about #include_next which is used everywhere.
@@ -809,6 +819,10 @@ else()
Valid backends are: serial, std_thread and libdispatch")
endif()
+if (LIBCXX_PSTL_GPU_BACKEND STREQUAL "omp_offload")
+ config_define(1 _LIBCPP_PSTL_GPU_BACKEND_OMP_OFFLOAD)
+endif()
+
if (LIBCXX_ABI_DEFINES)
set(abi_defines)
foreach (abi_define ${LIBCXX_ABI_DEFINES})
diff --git a/libcxx/include/CMakeLists.txt b/libcxx/include/CMakeLists.txt
index 2ec755236dbaee2..a3d72df61a86dde 100644
--- a/libcxx/include/CMakeLists.txt
+++ b/libcxx/include/CMakeLists.txt
@@ -85,6 +85,11 @@ set(files
__algorithm/pstl_backends/cpu_backends/thread.h
__algorithm/pstl_backends/cpu_backends/transform.h
__algorithm/pstl_backends/cpu_backends/transform_reduce.h
+ __algorithm/pstl_backends/gpu_backend.h
+ __algorithm/pstl_backends/gpu_backends/backend.h
+ __algorithm/pstl_backends/gpu_backends/fill.h
+ __algorithm/pstl_backends/gpu_backends/for_each.h
+ __algorithm/pstl_backends/gpu_backends/omp_offload.h
__algorithm/pstl_copy.h
__algorithm/pstl_count.h
__algorithm/pstl_fill.h
diff --git a/libcxx/include/__algorithm/pstl_backend.h b/libcxx/include/__algorithm/pstl_backend.h
index 93372f019031b63..47f5191b48517ba 100644
--- a/libcxx/include/__algorithm/pstl_backend.h
+++ b/libcxx/include/__algorithm/pstl_backend.h
@@ -10,6 +10,7 @@
#define _LIBCPP___ALGORITHM_PSTL_BACKEND_H
#include <__algorithm/pstl_backends/cpu_backend.h>
+#include <__algorithm/pstl_backends/gpu_backend.h>
#include <__config>
#include <execution>
@@ -179,10 +180,17 @@ struct __select_backend<std::execution::parallel_policy> {
using type = __cpu_backend_tag;
};
+# if defined(_LIBCPP_PSTL_GPU_BACKEND_OMP_OFFLOAD)
+template <>
+struct __select_backend<std::execution::parallel_unsequenced_policy> {
+ using type = __gpu_backend_tag;
+};
+# else
template <>
struct __select_backend<std::execution::parallel_unsequenced_policy> {
using type = __cpu_backend_tag;
};
+# endif
# else
diff --git a/libcxx/include/__algorithm/pstl_backends/gpu_backend.h b/libcxx/include/__algorithm/pstl_backends/gpu_backend.h
new file mode 100644
index 000000000000000..7237036156a1bf3
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl_backends/gpu_backend.h
@@ -0,0 +1,21 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_H
+
+#include <__config>
+
+#include <__algorithm/pstl_backends/gpu_backends/backend.h>
+
+#if defined(_LIBCPP_PSTL_GPU_BACKEND_OMP_OFFLOAD)
+# include <__algorithm/pstl_backends/gpu_backends/fill.h>
+# include <__algorithm/pstl_backends/gpu_backends/for_each.h>
+#endif
+
+#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_H
diff --git a/libcxx/include/__algorithm/pstl_backends/gpu_backends/backend.h b/libcxx/include/__algorithm/pstl_backends/gpu_backends/backend.h
new file mode 100644
index 000000000000000..a8b400afbb94d9d
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl_backends/gpu_backends/backend.h
@@ -0,0 +1,33 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_BACKEND_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_BACKEND_H
+
+#include <__config>
+#include <cstddef>
+
+#if defined(_LIBCPP_PSTL_GPU_BACKEND_OMP_OFFLOAD)
+# include <__algorithm/pstl_backends/gpu_backends/omp_offload.h>
+#endif
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+# pragma GCC system_header
+#endif
+
+#if _LIBCPP_STD_VER >= 17
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+struct __gpu_backend_tag {};
+
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // _LIBCPP_STD_VER >= 17
+
+#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKEND_BACKEND_H
diff --git a/libcxx/include/__algorithm/pstl_backends/gpu_backends/fill.h b/libcxx/include/__algorithm/pstl_backends/gpu_backends/fill.h
new file mode 100644
index 000000000000000..32926da87e2a083
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl_backends/gpu_backends/fill.h
@@ -0,0 +1,59 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FILL_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FILL_H
+
+#include <__algorithm/fill.h>
+#include <__algorithm/pstl_backends/cpu_backends/backend.h>
+#include <__algorithm/pstl_backends/gpu_backends/backend.h>
+#include <__config>
+#include <__iterator/concepts.h>
+#include <__type_traits/is_execution_policy.h>
+#include <__utility/terminate_on_exception.h>
+#include <stdio.h>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+# pragma GCC system_header
+#endif
+
+#if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+template <class _ExecutionPolicy, class _ForwardIterator, class _Tp>
+_LIBCPP_HIDE_FROM_ABI void
+__pstl_fill(__gpu_backend_tag, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
+ // It is only safe to execute for_each on the GPU, it the execution policy is
+ // parallel unsequenced, as it is the only execution policy prohibiting throwing
+ // exceptions and allowing SIMD instructions
+ if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
+ __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+ std::__par_backend::__parallel_for_simd_val_1(__first, __last - __first, __value);
+ }
+ // Else if the excution policy is parallel, we execute for_each on the CPU instead
+ else if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
+ __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+ std::__terminate_on_exception([&] {
+ __par_backend::__parallel_for(
+ __first, __last, [&__value](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
+ std::__pstl_fill<__remove_parallel_policy_t<_ExecutionPolicy>>(
+ __cpu_backend_tag{}, __brick_first, __brick_last, __value);
+ });
+ });
+ // Else we execute for_each in serial
+ } else {
+ std::fill(__first, __last, __value);
+ }
+}
+
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FILL_H
diff --git a/libcxx/include/__algorithm/pstl_backends/gpu_backends/for_each.h b/libcxx/include/__algorithm/pstl_backends/gpu_backends/for_each.h
new file mode 100644
index 000000000000000..14de2af8e4a15c6
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl_backends/gpu_backends/for_each.h
@@ -0,0 +1,59 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FOR_EACH_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FOR_EACH_H
+
+#include <__algorithm/for_each.h>
+#include <__algorithm/pstl_backends/cpu_backends/backend.h>
+#include <__algorithm/pstl_backends/gpu_backends/backend.h>
+#include <__config>
+#include <__iterator/concepts.h>
+#include <__type_traits/is_execution_policy.h>
+#include <__utility/terminate_on_exception.h>
+#include <stdio.h>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+# pragma GCC system_header
+#endif
+
+#if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+template <class _ExecutionPolicy, class _ForwardIterator, class _Functor>
+_LIBCPP_HIDE_FROM_ABI void
+__pstl_for_each(__gpu_backend_tag, _ForwardIterator __first, _ForwardIterator __last, _Functor __func) {
+ // It is only safe to execute for_each on the GPU, it the execution policy is
+ // parallel unsequenced, as it is the only execution policy prohibiting throwing
+ // exceptions and allowing SIMD instructions
+ if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&
+ __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+ std::__par_backend::__parallel_for_simd_1(__first, __last - __first, __func);
+ }
+ // Else if the excution policy is parallel, we execute for_each on the CPU instead
+ else if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> &&
+ __has_random_access_iterator_category_or_concept<_ForwardIterator>::value) {
+ std::__terminate_on_exception([&] {
+ std::__par_backend::__parallel_for(
+ __first, __last, [__func](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {
+ std::__pstl_for_each<__remove_parallel_policy_t<_ExecutionPolicy>>(
+ __cpu_backend_tag{}, __brick_first, __brick_last, __func);
+ });
+ });
+ // Else we execute for_each in serial
+ } else {
+ std::for_each(__first, __last, __func);
+ }
+}
+
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_GPU_BACKNEDS_FOR_EACH_H
diff --git a/libcxx/include/__algorithm/pstl_backends/gpu_backends/omp_offload.h b/libcxx/include/__algorithm/pstl_backends/gpu_backends/omp_offload.h
new file mode 100644
index 000000000000000..4baa4e7f65859d1
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl_backends/gpu_backends/omp_offload.h
@@ -0,0 +1,94 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_CPU_BACKENDS_OMP_OFFLOAD_H
+#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_CPU_BACKENDS_OMP_OFFLOAD_H
+
+#include <__assert>
+#include <__config>
+#include <__utility/move.h>
+#include <cstddef>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+# pragma GCC system_header
+#endif
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+#if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+namespace __par_backend {
+inline namespace __omp_gpu_backend {
+
+// In OpenMP, we need to extract the pointer for the underlying data for data
+// structures like std::vector and std::array to be able to map the data to the
+// device.
+
+template <typename T>
+_LIBCPP_HIDE_FROM_ABI inline T __omp_extract_base_ptr(T p) {
+ return p;
+}
+
+template <typename T>
+_LIBCPP_HIDE_FROM_ABI inline T __omp_extract_base_ptr(std::__wrap_iter<T> w) {
+ std::pointer_traits<std::__wrap_iter<T>> PT;
+ return PT.to_address(w);
+}
+
+// Applying function or lambda in a loop
+
+template <class _Iterator, class _DifferenceType, class _Function>
+_LIBCPP_HIDE_FROM_ABI _Iterator
+__omp_parallel_for_simd_1(_Iterator __first, _DifferenceType __n, _Function __f) noexcept {
+# pragma omp target teams distribute parallel for simd map(tofrom : __first[0 : __n])
+ for (_DifferenceType __i = 0; __i < __n; ++__i)
+ __f(__first[__i]);
+
+ return __first + __n;
+}
+
+// Extracting the underlying pointer
+
+template <class _Iterator, class _DifferenceType, class _Function>
+_LIBCPP_HIDE_FROM_ABI _Iterator __parallel_for_simd_1(_Iterator __first, _DifferenceType __n, _Function __f) noexcept {
+ __omp_parallel_for_simd_1(__omp_gpu_backend::__omp_extract_base_ptr(__first), __n, __f);
+ return __first + __n;
+}
+
+// Assigning a value in a loop
+
+template <class _Index, class _DifferenceType, class _Tp>
+_LIBCPP_HIDE_FROM_ABI _Index
+__omp_parallel_for_simd_val_1(_Index __first, _DifferenceType __n, const _Tp& __value) noexcept {
+# pragma omp target teams distribute parallel for simd map(tofrom : __first[0 : __n]) map(to : __value)
+ for (_DifferenceType __i = 0; __i < __n; ++__i)
+ __first[__i] = __value;
+
+ return __first + __n;
+}
+
+template <class _Index, class _DifferenceType, class _Tp>
+_LIBCPP_HIDE_FROM_ABI _Index
+__parallel_for_simd_val_1(_Index __first, _DifferenceType __n, const _Tp& __value) noexcept {
+ __omp_parallel_for_simd_val_1(__omp_gpu_backend::__omp_extract_base_ptr(__first), __n, __value);
+ return __first + __n;
+}
+
+} // namespace __omp_gpu_backend
+} // namespace __par_backend
+
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_POP_MACROS
+
+#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_CPU_BACKENDS_OMP_OFFLOAD_H
diff --git a/libcxx/include/__config_site.in b/libcxx/include/__config_site.in
index c85cbcd02c441b9..e0edddce3afc3ff 100644
--- a/libcxx/include/__config_site.in
+++ b/libcxx/include/__config_site.in
@@ -34,6 +34,7 @@
#cmakedefine _LIBCPP_PSTL_CPU_BACKEND_SERIAL
#cmakedefine _LIBCPP_PSTL_CPU_BACKEND_THREAD
#cmakedefine _LIBCPP_PSTL_CPU_BACKEND_LIBDISPATCH
+#cmakedefine _LIBCPP_PSTL_GPU_BACKEND_OMP_OFFLOAD
// Hardening.
#cmakedefine01 _LIBCPP_ENABLE_HARDENED_MODE_DEFAULT
|
Do we want to enable GPU offloading when compiling |
I made it depend on CMake options only. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
a5f7752
to
96adadf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the patch! I think this is really great and I'm glad we're paving the way for new backends being added in the library. Since this is the first such backend, there are some thing to change as we discussed during our live review, but I definitely see a path forward.
Note to self: did not have time to review omp_offload.h
yet, will do next time.
@@ -0,0 +1,349 @@ | |||
//===----------------------------------------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not attached to this line:
We need to discuss how exceptions are handled when we offload to the GPU. @AntonRydahl mentioned that throw
was compiled to __builtin_trap()
. This is going to both fail our tests and make us non-conforming. Our preference would be that the OpenMP backend doesn't try to offload if there are exceptions in the code path we're considering offloading.
Needs more discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have written a section about this in the libc++
documentation. Do you think the description is sufficiently detailed?
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can actually remove this header entirely until you have an implementation for it, since it'll be implemented using std::find_if
. Note that if you run into issues while doing that, it's a pre-existing PSTL bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is unfortunately a bug in the PSTL logic. I cannot compile libcxx
if I remove the overlay header files for any_of
, find_if
, or stable_sort
.
libcxx/include/__algorithm/pstl_backends/gpu_backends/for_each.h
Outdated
Show resolved
Hide resolved
libcxx/include/__algorithm/pstl_backends/gpu_backends/transform.h
Outdated
Show resolved
Hide resolved
libcxx/include/__algorithm/pstl_backends/gpu_backends/transform.h
Outdated
Show resolved
Hide resolved
libcxx/include/__algorithm/pstl_backends/gpu_backends/transform_reduce.h
Outdated
Show resolved
Hide resolved
template <class _T1, class _T2, class _T3> | ||
struct _LIBCPP_HIDE_FROM_ABI __is_supported_reduction : std::false_type {}; | ||
|
||
# define __PSTL_IS_SUPPORTED_REDUCTION(funname) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's something trying to emerge here. We already have libcxx/include/__type_traits/predicate_traits.h
and libcxx/include/__type_traits/operation_traits.h
. IMO we could unify those into something like:
template <>
struct __desugars_to<__equal_to, std::equal_to<>> : true_type {};
template <>
struct __desugars_to<ranges::equal_to, std::equal_to<>> : true_type {};
template <>
struct __desugars_to<ranges::plus, std::plus<>> : std::true_type {};
// etc...
Then, this becomes simply:
template <class _Func>
struct __is_supported_reduction : bool_constant<
__desugars_to<_Func, std::minus<>>::value ||
__desugars_to<_Func, std::multiplies<>>::value ||
__desugars_to<_Func, std::logical_and<>>::value ||
__desugars_to<_Func, std::logical_or<>>::value ||
__desugars_to<_Func, std::bit_and<>>::value ||
__desugars_to<_Func, std::bit_or<>>::value ||
__desugars_to<_Func, std::bit_xor<>>::value
> {};
This would mandate some prior refactoring, but it would be quite nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To not make this PR even bigger, I have added a refactor to use __desugars_to
here: #68642
I hope that is what you meant. 😄
…TION_POINT The _LIBCPP_PSTL_CUSTOMIZATION_POINT macro was assuming that the policy was called _RawPolicy and referencing it by name. It happened to always work but this was definitely accidental and an oversight in the original implementation. This patch fixes that by passing the policy to the macro explicitly. Noticed while reviewing llvm#66968.
…TION_POINT (#68238) The _LIBCPP_PSTL_CUSTOMIZATION_POINT macro was assuming that the policy was called _RawPolicy and referencing it by name. It happened to always work but this was definitely accidental and an oversight in the original implementation. This patch fixes that by passing the policy to the macro explicitly. Noticed while reviewing #66968.
@philnik777 Last we spoke, the design work around the customization points was still ongoing. Have we locked it in now? |
Kind-of, but not really. We'll probably get more insights while working on this. We're working closely with @AntonRydahl and others to get this backend into libc++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look really good! Note to self: still need to dive into the implementation of most algorithms, but we discussed for_each
and offloading conditions in some details.
This PR addresses a smaller detail discussed in the code review for #66968. Currently, some functions in the `libc++` PSTL CPU backend have been appended with a digit to indicate the number of input iterator arguments. However, there is no need to change the name for each version as overloading can be used instead. This PR will make the naming more consistent in the the CPU and the proposed OpenMP backend.
// parallel unsequenced, as it is the only execution policy prohibiting throwing | ||
// exceptions and allowing SIMD instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not technically correct, since exceptions can be thrown from user code even when par_unseq
is used.
And unseq
also allows for SIMD. So I think we need to explain that we don't have sequencing between the teams (thanks @jdoerfert !).
Also, I think it might be worth promoting this comment to openmp.h
since it is not specific for for_each
.
This PR addresses a smaller detail discussed in the code review for llvm/llvm-project#66968. Currently, some functions in the `libc++` PSTL CPU backend have been appended with a digit to indicate the number of input iterator arguments. However, there is no need to change the name for each version as overloading can be used instead. This PR will make the naming more consistent in the the CPU and the proposed OpenMP backend. NOKEYCHECK=True GitOrigin-RevId: f2b79ed9c6c858426b15a0374103ab901b5b2ef3
When adding the logic to error out when OpenMP is not enabled, I had to add a bit of CMake logic to allow installing the library without getting that same error. On the good side, this also made it possible to enable the OpenMP toolchain when running the tests, if the OpenMP backend was selected. I now found that a few of the tests failed, because I had not thought of handling cases where the base type of the iterator is not trivially copyable. For instance, some of the tests may throw exceptions in the copy constructor. The commit I pushed today makes those tests pass while still allowing my local tests to be executed on the GPU. What do you think of this change? |
I have also added a number of tests that verify that vectors of primitive data types are offloaded to the GPU. They use If there are no available devices, the new tests simply don't run: if (omp_get_num_devices() < 1)
return 0; Also, I have introduced a new feature such that they only run when the OpenMP PSTL backend is enabled: // REQUIRES: openmp_pstl_backend |
libcxx/test/libcxx/algorithms/alg.pstl.offload/for_each_lambda.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/libcxx/algorithms/alg.pstl.offload/for_each_lambda.pass.cpp
Outdated
Show resolved
Hide resolved
...est/libcxx/algorithms/alg.pstl.offload/transform_reduce_supported_binary_operations.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/libcxx/algorithms/alg.pstl.offload/fill_offload.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/libcxx/algorithms/alg.pstl.offload/find_if_exception.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/libcxx/algorithms/alg.pstl.offload/for_each_funptr.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/libcxx/algorithms/alg.pstl.offload/gpu_environemt_variables.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/libcxx/algorithms/alg.pstl.offload/gpu_environemt_variables.pass.cpp
Outdated
Show resolved
Hide resolved
@jdoerfert @AntonRydahl The new CI job I added is failing due to the inability to find |
Can you add openmp as a runtime (or project) to the cmake invocation? We want to build our own libomp, not rely on the system. |
Surely I need to do more than that? We are not building our own Clang, so I'd have to configure the test suite to use the just-built |
Isn't openmp available by default on Linux? |
generate-cmake -C "${MONOREPO_ROOT}/libcxx/cmake/caches/Generic-pstl-openmp.cmake" | ||
generate-cmake-openmp -C "${MONOREPO_ROOT}/libcxx/cmake/caches/Generic-pstl-openmp.cmake" | ||
echo "+++ Installing OpenMP and Clang" | ||
${NINJA} -vC "${BUILD_DIR}" install | ||
check-runtimes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this can be done in a more elegant way, @ldionne? By installing Clang and OpenMP as a project rather than a runtime, I got the build-bot to work locally.
If you don't have clang, put openmp as a project. That will work for CPU parallelism (not offloading). I would suggest clang though...
You can install it, for sure. |
@AntonRydahl does this pass smoke tests in the builder now? |
I am uncertain why the new OpenMP buildbot failed. I guess that is the only thing blocking this PR right now. If you have time, could I get you to try to run it with |
I run the builder locally, it did not reproduce the errors we see here.
|
@AntonRydahl I tried to rebase this but there are conflicts that I cannot easily resolve. |
2ecd90a
to
f87dfc5
Compare
@jdoerfert Rebase done. There will probably be many small issues because I couldn't build this locally, but I basically resolved the rebasing issues and reformulated the backend on top of the new dispatching mechanism. |
f87dfc5
to
9594623
Compare
Thank you so, so much for spending time fixing this, @ldionne! I will re-run the tests and see if I can replicate the errors from the build bot. |
9594623
to
736262f
Compare
Thanks a lot! Bots seem clean, right? FWIW, I can (re-)run stuff on linux machines, at least try. We also have people, incl. a GSOC student who will pick this up now. |
I think we need to wait and see if stage3 (generic-pstl-openmp, libcxx-runners-8-set) passes. |
@AntonRydahl I don't think we ever had a clean run of the OpenMP bot, did we? IMO the testing + CI story for the OpenMP backend is the only thing blocking this PR from being merged, everything else (like improving the backend) can and should be done in separate PRs so we can capture progress here. |
736262f
to
10d408f
Compare
The tests should not be able to locate libomptarget.so. Let's see what other failures we get from the build bot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should rebase on top of main
, that will solve the issues you're seeing on the macOS builders.
__libcpp_is_contiguous_iterator<_ForwardOutIterator>::value && is_trivially_copyable_v<_ValueType>) { | ||
return std::__rewrap_iter( | ||
__outit, | ||
__omp_transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__omp_transform( | |
__pstl::__omp_transform( |
Just pushed header files for ranges shift left and shift right for review, I was wondering if the directives and the argument types are correct, a specific format to follow? To test the files in libcxx we would have to run the executable llvm-lit in the openmp test directory? |
To address some of the good feedback from the recent request for comments, I have added a separate OpenMP offloading backend in
libcxx/include/__algorithm/pstl_backends
.To begin with, I have added
std::for_each
,std::transform
,std::transform_reduce
, andstd::fill
.I know there are still a lot of things missing, such as LIT tests, but I think we should try to agree on the project structure first.