-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASan][libc++] Turn on ASan annotations for short strings #79536
[ASan][libc++] Turn on ASan annotations for short strings #79536
Conversation
@llvm/pr-subscribers-libcxx Author: Tacet (AdvenamTacet) ChangesThis is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM:
Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang. Read PRs above and below for details. Additionaly annotations were updated (but it shouldn't have any impact on anything): Previous description: Originally merged here: #75882 Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the Fixes are implemented in: Problematic code from // In debug builds, we also scribble across the rest of the storage.
memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize); Original description: This commit turns on ASan annotations in Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: #72677 Requires to pass CI without fails:
Annotating Support in ASan API exists since dd1b7b7. You can turn off annotations for a specific allocator based on changes from 2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the If you have any questions, please email: Full diff: https://github.com/llvm/llvm-project/pull/79536.diff 5 Files Affected:
diff --git a/libcxx/include/string b/libcxx/include/string
index c5c245fa297d353..200ac207969a5cd 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -659,7 +659,6 @@ _LIBCPP_PUSH_MACROS
#else
# define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
#endif
-#define _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED false
_LIBCPP_BEGIN_NAMESPACE_STD
@@ -1896,22 +1895,17 @@ private:
#endif
}
- // ASan: short string is poisoned if and only if this function returns true.
- _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 bool __asan_short_string_is_annotated() const _NOEXCEPT {
- return _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED && !__libcpp_is_constant_evaluated();
- }
-
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_new(size_type __current_size) const _NOEXCEPT {
(void) __current_size;
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
- if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+ if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + capacity() + 1, data() + __current_size + 1);
#endif
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_delete() const _NOEXCEPT {
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
- if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+ if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + size() + 1, data() + capacity() + 1);
#endif
}
@@ -1919,7 +1913,7 @@ private:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_increase(size_type __n) const _NOEXCEPT {
(void) __n;
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
- if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+ if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + size() + 1, data() + size() + 1 + __n);
#endif
}
@@ -1927,7 +1921,7 @@ private:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_shrink(size_type __old_size) const _NOEXCEPT {
(void) __old_size;
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
- if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+ if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + __old_size + 1, data() + size() + 1);
#endif
}
diff --git a/libcxx/test/libcxx/containers/strings/basic.string/asan_deque_integration.pass.cpp b/libcxx/test/libcxx/containers/strings/basic.string/asan_deque_integration.pass.cpp
new file mode 100644
index 000000000000000..1205190b3a6e131
--- /dev/null
+++ b/libcxx/test/libcxx/containers/strings/basic.string/asan_deque_integration.pass.cpp
@@ -0,0 +1,182 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// REQUIRES: asan
+// UNSUPPORTED: c++03
+
+#include <cassert>
+#include <string>
+#include <array>
+#include <deque>
+#include "test_macros.h"
+#include "asan_testing.h"
+#include "min_allocator.h"
+
+// This tests exists to check if strings work well with deque, as those
+// may be partialy annotated, we cannot simply call
+// is_double_ended_contiguous_container_asan_correct, as it assumes that
+// object memory inside is not annotated, so we check everything in a more careful way.
+
+template <typename D>
+void verify_inside(D const& d) {
+ for (size_t i = 0; i < d.size(); ++i) {
+ assert(is_string_asan_correct(d[i]));
+ }
+}
+
+template <typename S, size_t N>
+S get_s(char c) {
+ S s;
+ for (size_t i = 0; i < N; ++i)
+ s.push_back(c);
+
+ return s;
+}
+
+template <class C, class S>
+void test_string() {
+ size_t const N = sizeof(S) < 256 ? (4096 / sizeof(S)) : 16;
+
+ {
+ C d1a(1), d1b(N), d1c(N + 1), d1d(5 * N);
+ verify_inside(d1a);
+ verify_inside(d1b);
+ verify_inside(d1c);
+ verify_inside(d1d);
+ }
+ {
+ C d2;
+ for (size_t i = 0; i < 3 * N + 2; ++i) {
+ d2.push_back(get_s<S, 1>(i % 10 + 'a'));
+ verify_inside(d2);
+ d2.push_back(get_s<S, 22>(i % 10 + 'b'));
+ verify_inside(d2);
+
+ d2.pop_front();
+ verify_inside(d2);
+ }
+ }
+ {
+ C d3;
+ for (size_t i = 0; i < 3 * N + 2; ++i) {
+ d3.push_front(get_s<S, 1>(i % 10 + 'a'));
+ verify_inside(d3);
+ d3.push_front(get_s<S, 28>(i % 10 + 'b'));
+ verify_inside(d3);
+
+ d3.pop_back();
+ verify_inside(d3);
+ }
+ }
+ {
+ C d4;
+ for (size_t i = 0; i < 3 * N + 2; ++i) {
+ // When there is no SSO, all elements inside should not be poisoned,
+ // so we can verify deque poisoning.
+ d4.push_front(get_s<S, 33>(i % 10 + 'a'));
+ verify_inside(d4);
+ assert(is_double_ended_contiguous_container_asan_correct(d4));
+ d4.push_back(get_s<S, 28>(i % 10 + 'b'));
+ verify_inside(d4);
+ assert(is_double_ended_contiguous_container_asan_correct(d4));
+ }
+ }
+ {
+ C d5;
+ for (size_t i = 0; i < 3 * N + 2; ++i) {
+ // In d4 we never had poisoned memory inside deque.
+ // Here we start with SSO, so part of the inside of the container,
+ // will be poisoned.
+ d5.push_front(S());
+ verify_inside(d5);
+ }
+ for (size_t i = 0; i < d5.size(); ++i) {
+ // We change the size to have long string.
+ // Memory owne by deque should not be poisoned by string.
+ d5[i].resize(100);
+ verify_inside(d5);
+ }
+
+ assert(is_double_ended_contiguous_container_asan_correct(d5));
+
+ d5.erase(d5.begin() + 2);
+ verify_inside(d5);
+
+ d5.erase(d5.end() - 2);
+ verify_inside(d5);
+
+ assert(is_double_ended_contiguous_container_asan_correct(d5));
+ }
+ {
+ C d6a;
+ assert(is_double_ended_contiguous_container_asan_correct(d6a));
+
+ C d6b(N + 2, get_s<S, 100>('a'));
+ d6b.push_front(get_s<S, 101>('b'));
+ while (!d6b.empty()) {
+ d6b.pop_back();
+ assert(is_double_ended_contiguous_container_asan_correct(d6b));
+ }
+
+ C d6c(N + 2, get_s<S, 102>('c'));
+ while (!d6c.empty()) {
+ d6c.pop_back();
+ assert(is_double_ended_contiguous_container_asan_correct(d6c));
+ }
+ }
+ {
+ C d7(9 * N + 2);
+
+ d7.insert(d7.begin() + 1, S());
+ verify_inside(d7);
+
+ d7.insert(d7.end() - 3, S());
+ verify_inside(d7);
+
+ d7.insert(d7.begin() + 2 * N, get_s<S, 1>('a'));
+ verify_inside(d7);
+
+ d7.insert(d7.end() - 2 * N, get_s<S, 1>('b'));
+ verify_inside(d7);
+
+ d7.insert(d7.begin() + 2 * N, 3 * N, get_s<S, 1>('c'));
+ verify_inside(d7);
+
+ // It may not be short for big element types, but it will be checked correctly:
+ d7.insert(d7.end() - 2 * N, 3 * N, get_s<S, 2>('d'));
+ verify_inside(d7);
+
+ d7.erase(d7.begin() + 2);
+ verify_inside(d7);
+
+ d7.erase(d7.end() - 2);
+ verify_inside(d7);
+ }
+}
+
+template <class S>
+void test_container() {
+ test_string<std::deque<S, std::allocator<S>>, S>();
+ test_string<std::deque<S, min_allocator<S>>, S>();
+ test_string<std::deque<S, safe_allocator<S>>, S>();
+}
+
+int main(int, char**) {
+ // Those tests support only types based on std::basic_string.
+ test_container<std::string>();
+ test_container<std::wstring>();
+#if TEST_STD_VER >= 11
+ test_container<std::u16string>();
+ test_container<std::u32string>();
+#endif
+#if TEST_STD_VER >= 20
+ test_container<std::u8string>();
+#endif
+
+ return 0;
+}
diff --git a/libcxx/test/libcxx/containers/strings/basic.string/asan_short.pass.cpp b/libcxx/test/libcxx/containers/strings/basic.string/asan_short.pass.cpp
new file mode 100644
index 000000000000000..53c70bed189b5c1
--- /dev/null
+++ b/libcxx/test/libcxx/containers/strings/basic.string/asan_short.pass.cpp
@@ -0,0 +1,56 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// REQUIRES: asan
+// UNSUPPORTED: c++03
+
+// <string>
+
+// Basic test if ASan annotations work for short strings.
+
+#include <string>
+#include <cassert>
+#include <cstdlib>
+
+#include "asan_testing.h"
+#include "min_allocator.h"
+#include "test_iterators.h"
+#include "test_macros.h"
+
+extern "C" void __sanitizer_set_death_callback(void (*callback)(void));
+
+void do_exit() { exit(0); }
+
+int main(int, char**) {
+ {
+ typedef cpp17_input_iterator<char*> MyInputIter;
+ // Should not trigger ASan.
+ std::basic_string<char, std::char_traits<char>, safe_allocator<char>> v;
+ char i[] = {'a', 'b', 'c', 'd'};
+
+ v.insert(v.begin(), MyInputIter(i), MyInputIter(i + 4));
+ assert(v[0] == 'a');
+ assert(is_string_asan_correct(v));
+ }
+
+ __sanitizer_set_death_callback(do_exit);
+ {
+ using T = char;
+ using C = std::basic_string<T, std::char_traits<T>, safe_allocator<T>>;
+ const T t[] = {'a', 'b', 'c', 'd', 'e', 'f', 'g'};
+ C c(std::begin(t), std::end(t));
+ assert(is_string_asan_correct(c));
+ assert(__sanitizer_verify_contiguous_container(c.data(), c.data() + c.size() + 1, c.data() + c.capacity() + 1) !=
+ 0);
+ volatile T foo = c[c.size() + 1]; // should trigger ASAN. Use volatile to prevent being optimized away.
+ assert(false); // if we got here, ASAN didn't trigger
+ ((void)foo);
+ }
+
+ return 0;
+}
diff --git a/libcxx/test/libcxx/containers/strings/basic.string/asan_vector_integration.pass.cpp b/libcxx/test/libcxx/containers/strings/basic.string/asan_vector_integration.pass.cpp
new file mode 100644
index 000000000000000..b7d95b7069083ae
--- /dev/null
+++ b/libcxx/test/libcxx/containers/strings/basic.string/asan_vector_integration.pass.cpp
@@ -0,0 +1,182 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// REQUIRES: asan
+// UNSUPPORTED: c++03
+
+#include <cassert>
+#include <string>
+#include <vector>
+#include <array>
+#include "test_macros.h"
+#include "asan_testing.h"
+#include "min_allocator.h"
+
+// This tests exists to check if strings work well with vector, as those
+// may be partialy annotated, we cannot simply call
+// is_contiguous_container_asan_correct, as it assumes that
+// object memory inside is not annotated, so we check everything in a more careful way.
+
+template <typename D>
+void verify_inside(D const& d) {
+ for (size_t i = 0; i < d.size(); ++i) {
+ assert(is_string_asan_correct(d[i]));
+ }
+}
+
+template <typename S, size_t N>
+S get_s(char c) {
+ S s;
+ for (size_t i = 0; i < N; ++i)
+ s.push_back(c);
+
+ return s;
+}
+
+template <class C, class S>
+void test_string() {
+ size_t const N = sizeof(S) < 256 ? (4096 / sizeof(S)) : 16;
+
+ {
+ C d1a(1), d1b(N), d1c(N + 1), d1d(5 * N);
+ verify_inside(d1a);
+ verify_inside(d1b);
+ verify_inside(d1c);
+ verify_inside(d1d);
+ }
+ {
+ C d2;
+ for (size_t i = 0; i < 3 * N + 2; ++i) {
+ d2.push_back(get_s<S, 1>(i % 10 + 'a'));
+ verify_inside(d2);
+ d2.push_back(get_s<S, 28>(i % 10 + 'b'));
+ verify_inside(d2);
+
+ d2.erase(d2.cbegin());
+ verify_inside(d2);
+ }
+ }
+ {
+ C d3;
+ for (size_t i = 0; i < 3 * N + 2; ++i) {
+ d3.push_back(get_s<S, 1>(i % 10 + 'a'));
+ verify_inside(d3);
+ d3.push_back(get_s<S, 28>(i % 10 + 'b'));
+ verify_inside(d3);
+
+ d3.pop_back();
+ verify_inside(d3);
+ }
+ }
+ {
+ C d4;
+ for (size_t i = 0; i < 3 * N + 2; ++i) {
+ // When there is no SSO, all elements inside should not be poisoned,
+ // so we can verify vector poisoning.
+ d4.push_back(get_s<S, 33>(i % 10 + 'a'));
+ verify_inside(d4);
+ assert(is_contiguous_container_asan_correct(d4));
+ d4.push_back(get_s<S, 28>(i % 10 + 'b'));
+ verify_inside(d4);
+ assert(is_contiguous_container_asan_correct(d4));
+ }
+ }
+ {
+ C d5;
+ for (size_t i = 0; i < 3 * N + 2; ++i) {
+ // In d4 we never had poisoned memory inside vector.
+ // Here we start with SSO, so part of the inside of the container,
+ // will be poisoned.
+ d5.push_back(S());
+ verify_inside(d5);
+ }
+ for (size_t i = 0; i < d5.size(); ++i) {
+ // We change the size to have long string.
+ // Memory owne by vector should not be poisoned by string.
+ d5[i].resize(100);
+ verify_inside(d5);
+ }
+
+ assert(is_contiguous_container_asan_correct(d5));
+
+ d5.erase(d5.begin() + 2);
+ verify_inside(d5);
+
+ d5.erase(d5.end() - 2);
+ verify_inside(d5);
+
+ assert(is_contiguous_container_asan_correct(d5));
+ }
+ {
+ C d6a;
+ assert(is_contiguous_container_asan_correct(d6a));
+
+ C d6b(N + 2, get_s<S, 100>('a'));
+ d6b.push_back(get_s<S, 101>('b'));
+ while (!d6b.empty()) {
+ d6b.pop_back();
+ assert(is_contiguous_container_asan_correct(d6b));
+ }
+
+ C d6c(N + 2, get_s<S, 102>('c'));
+ while (!d6c.empty()) {
+ d6c.pop_back();
+ assert(is_contiguous_container_asan_correct(d6c));
+ }
+ }
+ {
+ C d7(9 * N + 2);
+
+ d7.insert(d7.begin() + 1, S());
+ verify_inside(d7);
+
+ d7.insert(d7.end() - 3, S());
+ verify_inside(d7);
+
+ d7.insert(d7.begin() + 2 * N, get_s<S, 1>('a'));
+ verify_inside(d7);
+
+ d7.insert(d7.end() - 2 * N, get_s<S, 1>('b'));
+ verify_inside(d7);
+
+ d7.insert(d7.begin() + 2 * N, 3 * N, get_s<S, 1>('c'));
+ verify_inside(d7);
+
+ // It may not be short for big element types, but it will be checked correctly:
+ d7.insert(d7.end() - 2 * N, 3 * N, get_s<S, 2>('d'));
+ verify_inside(d7);
+
+ d7.erase(d7.begin() + 2);
+ verify_inside(d7);
+
+ d7.erase(d7.end() - 2);
+ verify_inside(d7);
+ }
+}
+
+template <class S>
+void test_container() {
+ test_string<std::vector<S, std::allocator<S>>, S>();
+ test_string<std::vector<S, min_allocator<S>>, S>();
+ test_string<std::vector<S, safe_allocator<S>>, S>();
+}
+
+int main(int, char**) {
+ // Those tests support only types based on std::basic_string.
+ test_container<std::string>();
+ test_container<std::wstring>();
+#if TEST_STD_VER >= 11
+ test_container<std::u16string>();
+ test_container<std::u32string>();
+#endif
+#if TEST_STD_VER >= 20
+ test_container<std::u8string>();
+#endif
+
+ return 0;
+}
diff --git a/libcxx/test/support/asan_testing.h b/libcxx/test/support/asan_testing.h
index 6bfc8280a4ead30..3785c1f9c20dea1 100644
--- a/libcxx/test/support/asan_testing.h
+++ b/libcxx/test/support/asan_testing.h
@@ -56,35 +56,16 @@ TEST_CONSTEXPR bool is_double_ended_contiguous_container_asan_correct(const std:
#endif
#if TEST_HAS_FEATURE(address_sanitizer)
-template <typename S>
-bool is_string_short(S const& s) {
- // We do not have access to __is_long(), but we can check if strings
- // buffer is inside strings memory. If strings memory contains its content,
- // SSO is in use. To check it, we can just confirm that the beginning is in
- // the string object memory block.
- // &s - beginning of objects memory
- // &s[0] - beginning of the buffer
- // (&s+1) - end of objects memory
- return (void*)std::addressof(s) <= (void*)std::addressof(s[0]) &&
- (void*)std::addressof(s[0]) < (void*)(std::addressof(s) + 1);
-}
-
template <typename ChrT, typename TraitsT, typename Alloc>
TEST_CONSTEXPR bool is_string_asan_correct(const std::basic_string<ChrT, TraitsT, Alloc>& c) {
if (TEST_IS_CONSTANT_EVALUATED)
return true;
- if (!is_string_short(c) || _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED) {
- if (std::__asan_annotate_container_with_allocator<Alloc>::value)
- return __sanitizer_verify_contiguous_container(c.data(), c.data() + c.size() + 1, c.data() + c.capacity() + 1) !=
- 0;
- else
- return __sanitizer_verify_contiguous_container(
- c.data(), c.data() + c.capacity() + 1, c.data() + c.capacity() + 1) != 0;
- } else {
- return __sanitizer_verify_contiguous_container(std::addressof(c), std::addressof(c) + 1, std::addressof(c) + 1) !=
- 0;
- }
+ if (std::__asan_annotate_container_with_allocator<Alloc>::value)
+ return __sanitizer_verify_contiguous_container(c.data(), c.data() + c.size() + 1, c.data() + c.capacity() + 1) != 0;
+ else
+ return __sanitizer_verify_contiguous_container(
+ c.data(), c.data() + c.capacity() + 1, c.data() + c.capacity() + 1) != 0;
}
#else
# include <string>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only we had pre-commit CI everywhere, this would have been easier to land in the first place :-)
This commit adds information that only long strings are annotated, and with all allocators by default. To read why short string annotations are not turned on yet, read related PR: llvm#79536
@ldionne thanks for looking at it! For now I changed it to WIP to avoid merging by mistake, before we figure out and understand the real reason behind buildbot failures. Pre-commit CI with buildbots would be really convenient! |
@AdvenamTacet I am confused, are we still seeing issues on bots? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we are not landing as-is, as we need a workaround for asan?
This commit adds information that only long strings are annotated, and with all allocators by default. To read why short string annotations are not turned on yet, read comments in a related PR: #79536 --------- Co-authored-by: Mark de Wever <[email protected]>
This commit adds information that only long strings are annotated, and with all allocators by default. To read why short string annotations are not turned on yet, read comments in a related PR: #79536 --------- Co-authored-by: Mark de Wever <[email protected]>
This commit adds information that only long strings are annotated, and with all allocators by default. To read why short string annotations are not turned on yet, read comments in a related PR: #79536 Upstreamed in: 7661ade Upstream PR: #80912 --------- Co-authored-by: Mark de Wever <[email protected]> Co-authored-by: Mark de Wever <[email protected]>
8783125
to
69f4912
Compare
Sorry for the delay, I had to focus on other projects and it took me a while to fix the issue and properly test it. @vitalybuka I believe with those new changes it's ready to be upstreamed. New changes are in a separate commit
#define _LIBCPP_ASAN_VOLATILE_WRAPPER(ptr) __asan_volatile_wrapper(ptr)
#else
#define _LIBCPP_ASAN_VOLATILE_WRAPPER(ptr) ptr
#endif I was thinking about a different approach where we rely on compiler optimizations and always call I can't find a problem with that implementation, so I am happy to upstream it. Hope it won't cause a problem anymore. |
This commit adds information that only long strings are annotated, and with all allocators by default. To read why short string annotations are not turned on yet, read comments in a related PR: llvm/llvm-project#79536 --------- Co-authored-by: Mark de Wever <[email protected]> NOKEYCHECK=True GitOrigin-RevId: 7661ade5d1ac4fc8e1e2339b2476cb8e45c24641
35a0328
to
835c7aa
Compare
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM: - llvm#79489 - llvm#79522 Additionaly annotations were updated (but it shouldn't have any impact on anything): - llvm#79292 Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang. Read PRs above and below for details. --- Previous description: Originally merged here: llvm#75882 Reverted here: llvm#78627 Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error. Fixes are implemented in: - llvm#79065 - llvm#79066 Problematic code from `UniqueFunctionBase` for example: ```cpp // In debug builds, we also scribble across the rest of the storage. memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize); ``` --- Original description: This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case). Originally suggested here: https://reviews.llvm.org/D147680 String annotations added here: llvm#72677 Requires to pass CI without fails: - llvm#75845 - llvm#75858 Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations. Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec. This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds. The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability. If you have any questions, please email: [email protected] [email protected]
4009316
to
82abf82
Compare
This commit stops some optimizations when compiling with ASan. - Short strings make std::basic_string to not be trivially relocatable, because memory has to be unpoisoned. It changes value of `__trivially_relocatable` when compiling with ASan. - It truns off compiler stack optimizations with `__asan_volatile_wrapper`, the function is not used when compiling without ASan. - It turns off instrumentation in a few functions.
82abf82
to
0ba0371
Compare
I've wrapped up the final changes and fixes, and everything is ready for final testing, which I'm starting right now.
Let me know if you have any questions or suggestions before we upstream. Edit: |
// external memory. In such cases, the destructor is responsible for unpoisoning | ||
// the memory to avoid triggering false positives. | ||
// Therefore it's crucial to ensure the destructor is called | ||
using __trivially_relocatable = false_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using __trivially_relocatable = false_type; | |
using __trivially_relocatable = void; |
IMO ASan shouldn't affect traits of the type. This change definitely needs more discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there needs to be some ASan handling when relocating objects? We're destroying objects after all, which should probably poison memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to incrementally improve string annotations.
Without change of __trivially_relocatable
and no other changes ASan error happens exactly here:
llvm-project/libcxx/include/__memory/uninitialized_algorithms.h
Lines 644 to 646 in 11a6799
} else { | |
__builtin_memcpy(const_cast<__remove_const_t<_Tp>*>(__result), __first, sizeof(_Tp) * (__last - __first)); | |
} |
__builtin_memcpy
accesses poisoned memory, when objects memory is poisoned.
We can unpoison memory there for every type. Then problematic code area would be:
} else {
#ifndef _LIBCPP_HAS_NO_ASAN
__asan_unpoison_memory_region(__first, sizeof(_Tp) * (__last - __first));
#endif
__builtin_memcpy(const_cast<__remove_const_t<_Tp>*>(__result), __first, sizeof(_Tp) * (__last - __first));
}
Disadvantage of that over changing the value of __trivially_relocatable
is the need of unpoisoning memory like that in every function which depends on __libcpp_is_trivially_relocatable
. But it shouldn't be a big problem.
What I don't like more is fact that objects after that move won't be poisoned correctly (may result in false negatives).
My idea to fix it is creating a compiler-rt function __asan_move_annotations
which may be used here instead of __asan_unpoison_memory_region
, making old buffer unpoisoned and the new buffer poisoned in the same way as the old one.
If we try to optimize it as much as possible, creating something similar to __libcpp_is_trivially_relocatable
just for ASan is possible. For example __libcpp_may_memory_be_poisoned
, and calling __asan_unpoison_memory_region/__asan_move_annotations
if and only if that value is true. However, __asan_unpoison_memory_region
should be cheaper than __builtin_memcpy
, so I think we can accept additional cost here as new ASan trait adds a lot of complexity.
We're destroying objects after all, which should probably poison memory.
Poisoning, if happens, happens in the allocator (during deallocation) and we cannot assume anything about it. For example, allocator cleaning memory before freeing it touches the whole memory before annotations are updated. Therefore __annotate_delete()
in all classes is necessary.
But something like __asan_move_annotations
is possible and I can create it. Then we can go back to one value of __trivially_relocatable
.
To put my actions behind my words, I will open a PR with that change Today or Tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that change as two or three PRs.
- [compiler-rt][ASan] Add function copying annotations #91702
- Add
__memcpy_with_asan
. - Update
std::basic_string
.
Last thing is trivial. __memcpy_with_asan
I hope to be fairly easy, but I am not sure what place is the best to add it. Fist one I just drafted, but didn't test it yet. I should finish everything today, unless we have a discussion about changing direction of that work. Let me know what you think.
Addresses the comment: #79536 (comment) Changes the type to `void` instead of `false_type`. The value is used here: https://github.com/llvm/llvm-project/blob/6f1013a5b3f92d3ae6e378d6706584a2a44e6964/libcxx/include/__type_traits/is_trivially_relocatable.h#L35-L38
This breaks asan for most Objective-C++ programs, see #96099. This seems fairly tricky to fix. |
I'm now done debugging all issues encountered after merging this to Chromium. Here's some user feedback, do with it what you will. I can't help but think that it might make sense to make this opt-in by default, in light of the below. There were four distinct issues.
It was extremely helpful that there's a toggle for turning off asan annotations just for std::string. This allowed us to roll libc++ and just turn off this feature and then finish diagnosing issues asynchronously. EDIT: Oh, and we ended up no longer defining |
For 3, I suppose the problem is that we (libc++) use has_feature(asan) to check if asan is enabled. That'l checks if asan is enabled for the current TU, but we really want "is asan used in any TU". I.e. having a user-settable knob instead of has_feature (and maybe use has_feature as fallback if the knob isn't explicitly set). |
@nico I am not certain whether you are arguing for the feature being enabled by default, or being disabled by default but opt-in. I think it's the latter but it's not 100% clear to me. IMO, (2) is a bug in the Blocks runtime and #96269 will work around that issue until we fix the Blocks runtime. (3) is indeed a problem, but I think the real answer here is that we must begin shipping a sanitized version of libc++ and have the clang driver select the right version based on TLDR, I am still leaning towards keeping this enabled by default except on Apple where it is currently badly broken due to the blocks runtime. If it turns out that (3) is a big problem and is biting users quite a bit, perhaps we should revisit this decision and keep it disabled until we actually ship a different dylib based on |
Hey, First of all, what you already pointed out, (3) is a known issue since creation of ASan vector annotations. Container annotations (for all containers) don't work when only part of the code is compiled with ASan and therefore it's possible to turn off container annotations in runtime with We removed flag allowing to turn off short string annotations with reasoning that we don't want to force users to know that under the hood we have something like SSO. Adding it back is a five seconds work. We can add it and never mention it anywhere, therefore only people who read the file (and therefore are aware of SSO) know about it. Alternatively, (and I consider it much better solution, at least at the moment of writing it) is adding something like If we have a consensus about implementing one or another, I'm happy to sit to it. Complex solution to (3) would require big changes or additional code in non-ASan builds. I don't think anyone is willing to accept that and I don't support it. Regarding the (2), I agree with @ldionne and have nothing to add. (4) is very similar in nature. I think you have not-optimal workaround for (4). I will address that in #91702 but for the time being I see two better solutions.
I hope to have a better solution soon, but #91702 turned out to be extremely time consuming and I was forced to postpone working on it for a while, but I hope to commit that whole week to open source and to finish it soon. At the same time, I think ASan should detect that, we should only make it easier for developers to tell "I understand that this memory operation is not safe". I'm really happy about (1), if you have more data about recent ASan container annotations changes (short string annotations/long string annotations/deque annotations/annotations of non-default allocators) and can share them, it would be extremely valuable for me and I would very appreciate that. I'm interested in any and every piece of data you can share. In general, I'm very strongly in favor of enabling all annotations by default. But we can make disabling some annotations easier. All ASan errors mentioned here should be reported by ASan, imho. As mention before, I think we should only make it easier for developers to tell "I understand that this memory operation is not safe". And I plan to address it. I don't want to commit to finalize it this week, but I will try to do it asap. I think, now the best thing we can do is adding runtime ASan options to turn off string annotations and (maybe) short string annotations. Thank you for your feedback again! |
I don't think it's feasible.
Right now, any string annotations are enabled only when libc++ is compiled with ASan and right now it has to be done by user. So it would help only in case everything is compiled with ASan, but some TUs are compiled with stock libc++ and some with self-compiled ASan libc++. In this case, if I understand correctly, it's a conscious decision to disable ASan completely for the whole TU. And therefore it's against assumptions of container annotations, and shipping instrumented libc++ would not help in that case. |
How come? To be clear, all I'm suggesting is something like this: % git diff
diff --git a/libcxx/include/__config b/libcxx/include/__config
index 108f700823cb..aaab10129a6f 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -355,7 +355,15 @@ typedef __char32_t char32_t;
# define _LIBCPP_HAS_BLOCKS_RUNTIME
# endif
-# if !__has_feature(address_sanitizer)
+# if !defined(_LIBCPP_SOME_USER_CODE_IS_USING_ASAN)
+# if _has_feature(address_sanitizer)
+# define _LIBCPP_SOME_USER_CODE_IS_USING_ASAN 1
+# else
+# define _LIBCPP_SOME_USER_CODE_IS_USING_ASAN 0
+# endif
+# endif
+
+# if !_LIBCPP_SOME_USER_CODE_IS_USING_ASAN
# define _LIBCPP_HAS_NO_ASAN
# endif Then people who use asan in some TUs but not in others can make sure to pass (Tweak name and delivery mechanism (cmake option that sets some toggle or what) of that define at will) |
@nico if I understand correctly, you suggest a way to not have instrumentation in a TU, but still have container annotation functions (to update container annotations). Thinking about it longer, I believe it should work. Not sure if we can expect some linking issues, but it shouldn't be a big problem. Maybe there is something else I don't consider. If someone decides to implement it now, I'm happy to review it. If not, I can look into it in future. |
FYI: I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116243 for libstdc++ after @pinskia pointed me this way. |
This pull request is the third iteration aiming to integrate short string annotations. This commit includes:
__trivially_relocatable
instd::basic_string
tofalse_type
when compiling with ASan (nothing changes when compiling without ASan). Short string annotations makestd::basic_string
to not be trivially relocatable, because memory has to be unpoisoned._LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
modifier to two functions._LIBCPP_ASAN_VOLATILE_WRAPPER
to prevent problematic stack optimizations (the macro modifies code behavior only when compiling with ASan).Previously we had issues with compiler optimization, which we understand thanks to @vitalybuka. This commit also addresses smaller changes in short string, since previous upstream attempts.
Problematic optimization was loading two values in code similar to:
We aim to resolve it with the volatile wrapper.
This commit is built on top of two previous attempts which descriptions are below.
Additionally, in the meantime, annotations were updated (but it shouldn't have any impact on anything):
Previous PR: #79049
Reverted: a16f81f
Previous description:
Originally merged here: #75882
Reverted here: #78627
Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the
UniqueFunctionBase
class and in theJSON.h
file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.Fixes are implemented in:
Problematic code from
UniqueFunctionBase
for example:Original description:
This commit turns on ASan annotations in
std::basic_string
for short stings (SSO case).Originally suggested here: https://reviews.llvm.org/D147680
String annotations added here: #72677
Requires to pass CI without fails:
std::basic_string
with all allocators #75845Annotating
std::basic_string
with default allocator is implemented in #72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED
, because we do not plan to support turning on and off short string annotations.Support in ASan API exists since dd1b7b7. You can turn off annotations for a specific allocator based on changes from 2fa1bec.
This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in
std::vector
andstd::deque
collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements instd::deque
, or between thestd::basic_string
's size (including the null terminator) and capacity bounds.The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the
std::equals
function. This function was taking iterators (iter1_begin
,iter1_end
,iter2_begin
) to perform the comparison, using a custom comparison function. When theiter1
object exceeded the length ofiter2
, an out-of-bounds read could occur on theiter2
object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.If you have any questions, please email: