Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASan][libc++] Turn on ASan annotations for short strings #79049

Merged
merged 2 commits into from
Jan 23, 2024

Conversation

AdvenamTacet
Copy link
Member

@AdvenamTacet AdvenamTacet commented Jan 22, 2024

Originally merged here: #75882
Reverted here: #78627

Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the UniqueFunctionBase class and in the JSON.h file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.

Fixes are implemented in:

Problematic code from UniqueFunctionBase for example:

#ifndef NDEBUG
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
#endif

Original description:

This commit turns on ASan annotations in std::basic_string for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: #72677

Requires to pass CI without fails:

Annotating std::basic_string with default allocator is implemented in #72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since dd1b7b7. You can turn off annotations for a specific allocator based on changes from 2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in std::vector and std::deque collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in std::deque, or between the std::basic_string's size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the std::equals function. This function was taking iterators (iter1_begin, iter1_end, iter2_begin) to perform the comparison, using a custom comparison function. When the iter1 object exceeded the length of iter2, an out-of-bounds read could occur on the iter2 object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in
llvm#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
llvm@dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    [email protected]
    [email protected]
@AdvenamTacet AdvenamTacet added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jan 22, 2024
@AdvenamTacet AdvenamTacet added this to the LLVM 18.0.X Release milestone Jan 22, 2024
@AdvenamTacet AdvenamTacet requested a review from a team as a code owner January 22, 2024 20:15
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 22, 2024

@llvm/pr-subscribers-llvm-adt
@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-libcxx

Author: Tacet (AdvenamTacet)

Changes

Originally merged here: #75882
Reverted here: #78627

Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the UniqueFunctionBase class and in the JSON.h file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.
Example from UniqueFunctionBase:

#ifndef NDEBUG
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
#endif

To deal with it, I additionally unpoison memory of two affected memory areas, before write to them happens.

With that fix (second commit in the PR), locally bootstrap ASan passed.


Original description:

This commit turns on ASan annotations in std::basic_string for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: #72677

Requires to pass CI without fails:

Annotating std::basic_string with default allocator is implemented in #72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since dd1b7b7. You can turn off annotations for a specific allocator based on changes from 2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in std::vector and std::deque collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in std::deque, or between the std::basic_string's size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the std::equals function. This function was taking iterators (iter1_begin, iter1_end, iter2_begin) to perform the comparison, using a custom comparison function. When the iter1 object exceeded the length of iter2, an out-of-bounds read could occur on the iter2 object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

advenam.tacet@<!-- -->trailofbits.com
disconnect3d@<!-- -->trailofbits.com

Full diff: https://github.com/llvm/llvm-project/pull/79049.diff

7 Files Affected:

  • (modified) libcxx/include/string (+4-10)
  • (added) libcxx/test/libcxx/containers/strings/basic.string/asan_deque_integration.pass.cpp (+182)
  • (added) libcxx/test/libcxx/containers/strings/basic.string/asan_short.pass.cpp (+56)
  • (added) libcxx/test/libcxx/containers/strings/basic.string/asan_vector_integration.pass.cpp (+182)
  • (modified) libcxx/test/support/asan_testing.h (+5-24)
  • (modified) llvm/include/llvm/ADT/FunctionExtras.h (+3)
  • (modified) llvm/include/llvm/Support/JSON.h (+3)
diff --git a/libcxx/include/string b/libcxx/include/string
index e97139206d4fa7c..4116f350a804764 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -659,7 +659,6 @@ _LIBCPP_PUSH_MACROS
 #else
 #  define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
 #endif
-#define _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED false
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
@@ -1896,22 +1895,17 @@ private:
 #endif
   }
 
-  // ASan: short string is poisoned if and only if this function returns true.
-  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 bool __asan_short_string_is_annotated() const _NOEXCEPT {
-    return _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED && !__libcpp_is_constant_evaluated();
-  }
-
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_new(size_type __current_size) const _NOEXCEPT {
     (void) __current_size;
 #if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
-    if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+    if (!__libcpp_is_constant_evaluated())
       __annotate_contiguous_container(data() + capacity() + 1, data() + __current_size + 1);
 #endif
   }
 
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_delete() const _NOEXCEPT {
 #if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
-    if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+    if (!__libcpp_is_constant_evaluated())
       __annotate_contiguous_container(data() + size() + 1, data() + capacity() + 1);
 #endif
   }
@@ -1919,7 +1913,7 @@ private:
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_increase(size_type __n) const _NOEXCEPT {
     (void) __n;
 #if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
-    if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+    if (!__libcpp_is_constant_evaluated())
       __annotate_contiguous_container(data() + size() + 1, data() + size() + 1 + __n);
 #endif
   }
@@ -1927,7 +1921,7 @@ private:
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_shrink(size_type __old_size) const _NOEXCEPT {
     (void) __old_size;
 #if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
-    if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
+    if (!__libcpp_is_constant_evaluated())
       __annotate_contiguous_container(data() + __old_size + 1, data() + size() + 1);
 #endif
   }
diff --git a/libcxx/test/libcxx/containers/strings/basic.string/asan_deque_integration.pass.cpp b/libcxx/test/libcxx/containers/strings/basic.string/asan_deque_integration.pass.cpp
new file mode 100644
index 000000000000000..b914609f35ddf36
--- /dev/null
+++ b/libcxx/test/libcxx/containers/strings/basic.string/asan_deque_integration.pass.cpp
@@ -0,0 +1,182 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// REQUIRES: asan
+// UNSUPPORTED: c++03
+
+#include <cassert>
+#include <string>
+#include <array>
+#include <deque>
+#include "test_macros.h"
+#include "asan_testing.h"
+#include "min_allocator.h"
+
+// This tests exists to check if strings work well with deque, as those
+// may be partialy annotated, we cannot simply call
+// is_double_ended_contiguous_container_asan_correct, as it assumes that
+// object memory inside is not annotated, so we check everything in a more careful way.
+
+template <typename D>
+void verify_inside(D const& d) {
+  for (size_t i = 0; i < d.size(); ++i) {
+    assert(is_string_asan_correct(d[i]));
+  }
+}
+
+template <typename S, size_t N>
+S get_s(char c) {
+  S s;
+  for (size_t i = 0; i < N; ++i)
+    s.push_back(c);
+
+  return s;
+}
+
+template <class C, class S>
+void test_string() {
+  size_t const N = sizeof(S) < 256 ? (4096 / sizeof(S)) : 16;
+
+  {
+    C d1a(1), d1b(N), d1c(N + 1), d1d(32 * N);
+    verify_inside(d1a);
+    verify_inside(d1b);
+    verify_inside(d1c);
+    verify_inside(d1d);
+  }
+  {
+    C d2;
+    for (size_t i = 0; i < 16 * N; ++i) {
+      d2.push_back(get_s<S, 1>(i % 10 + 'a'));
+      verify_inside(d2);
+      d2.push_back(get_s<S, 222>(i % 10 + 'b'));
+      verify_inside(d2);
+
+      d2.pop_front();
+      verify_inside(d2);
+    }
+  }
+  {
+    C d3;
+    for (size_t i = 0; i < 16 * N; ++i) {
+      d3.push_front(get_s<S, 1>(i % 10 + 'a'));
+      verify_inside(d3);
+      d3.push_front(get_s<S, 222>(i % 10 + 'b'));
+      verify_inside(d3);
+
+      d3.pop_back();
+      verify_inside(d3);
+    }
+  }
+  {
+    C d4;
+    for (size_t i = 0; i < 16 * N; ++i) {
+      // When there is no SSO, all elements inside should not be poisoned,
+      // so we can verify deque poisoning.
+      d4.push_front(get_s<S, 333>(i % 10 + 'a'));
+      verify_inside(d4);
+      assert(is_double_ended_contiguous_container_asan_correct(d4));
+      d4.push_back(get_s<S, 222>(i % 10 + 'b'));
+      verify_inside(d4);
+      assert(is_double_ended_contiguous_container_asan_correct(d4));
+    }
+  }
+  {
+    C d5;
+    for (size_t i = 0; i < 5 * N; ++i) {
+      // In d4 we never had poisoned memory inside deque.
+      // Here we start with SSO, so part of the inside of the container,
+      // will be poisoned.
+      d5.push_front(S());
+      verify_inside(d5);
+    }
+    for (size_t i = 0; i < d5.size(); ++i) {
+      // We change the size to have long string.
+      // Memory owne by deque should not be poisoned by string.
+      d5[i].resize(1000);
+      verify_inside(d5);
+    }
+
+    assert(is_double_ended_contiguous_container_asan_correct(d5));
+
+    d5.erase(d5.begin() + 2);
+    verify_inside(d5);
+
+    d5.erase(d5.end() - 2);
+    verify_inside(d5);
+
+    assert(is_double_ended_contiguous_container_asan_correct(d5));
+  }
+  {
+    C d6a;
+    assert(is_double_ended_contiguous_container_asan_correct(d6a));
+
+    C d6b(N + 2, get_s<S, 1000>('a'));
+    d6b.push_front(get_s<S, 1001>('b'));
+    while (!d6b.empty()) {
+      d6b.pop_back();
+      assert(is_double_ended_contiguous_container_asan_correct(d6b));
+    }
+
+    C d6c(N + 2, get_s<S, 1002>('c'));
+    while (!d6c.empty()) {
+      d6c.pop_back();
+      assert(is_double_ended_contiguous_container_asan_correct(d6c));
+    }
+  }
+  {
+    C d7(9 * N + 2);
+
+    d7.insert(d7.begin() + 1, S());
+    verify_inside(d7);
+
+    d7.insert(d7.end() - 3, S());
+    verify_inside(d7);
+
+    d7.insert(d7.begin() + 2 * N, get_s<S, 1>('a'));
+    verify_inside(d7);
+
+    d7.insert(d7.end() - 2 * N, get_s<S, 1>('b'));
+    verify_inside(d7);
+
+    d7.insert(d7.begin() + 2 * N, 3 * N, get_s<S, 1>('c'));
+    verify_inside(d7);
+
+    // It may not be short for big element types, but it will be checked correctly:
+    d7.insert(d7.end() - 2 * N, 3 * N, get_s<S, 2>('d'));
+    verify_inside(d7);
+
+    d7.erase(d7.begin() + 2);
+    verify_inside(d7);
+
+    d7.erase(d7.end() - 2);
+    verify_inside(d7);
+  }
+}
+
+template <class S>
+void test_container() {
+  test_string<std::deque<S, std::allocator<S>>, S>();
+  test_string<std::deque<S, min_allocator<S>>, S>();
+  test_string<std::deque<S, safe_allocator<S>>, S>();
+}
+
+int main(int, char**) {
+  // Those tests support only types based on std::basic_string.
+  test_container<std::string>();
+  test_container<std::wstring>();
+#if TEST_STD_VER >= 11
+  test_container<std::u16string>();
+  test_container<std::u32string>();
+#endif
+#if TEST_STD_VER >= 20
+  test_container<std::u8string>();
+#endif
+
+  return 0;
+}
diff --git a/libcxx/test/libcxx/containers/strings/basic.string/asan_short.pass.cpp b/libcxx/test/libcxx/containers/strings/basic.string/asan_short.pass.cpp
new file mode 100644
index 000000000000000..53c70bed189b5c1
--- /dev/null
+++ b/libcxx/test/libcxx/containers/strings/basic.string/asan_short.pass.cpp
@@ -0,0 +1,56 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// REQUIRES: asan
+// UNSUPPORTED: c++03
+
+// <string>
+
+// Basic test if ASan annotations work for short strings.
+
+#include <string>
+#include <cassert>
+#include <cstdlib>
+
+#include "asan_testing.h"
+#include "min_allocator.h"
+#include "test_iterators.h"
+#include "test_macros.h"
+
+extern "C" void __sanitizer_set_death_callback(void (*callback)(void));
+
+void do_exit() { exit(0); }
+
+int main(int, char**) {
+  {
+    typedef cpp17_input_iterator<char*> MyInputIter;
+    // Should not trigger ASan.
+    std::basic_string<char, std::char_traits<char>, safe_allocator<char>> v;
+    char i[] = {'a', 'b', 'c', 'd'};
+
+    v.insert(v.begin(), MyInputIter(i), MyInputIter(i + 4));
+    assert(v[0] == 'a');
+    assert(is_string_asan_correct(v));
+  }
+
+  __sanitizer_set_death_callback(do_exit);
+  {
+    using T     = char;
+    using C     = std::basic_string<T, std::char_traits<T>, safe_allocator<T>>;
+    const T t[] = {'a', 'b', 'c', 'd', 'e', 'f', 'g'};
+    C c(std::begin(t), std::end(t));
+    assert(is_string_asan_correct(c));
+    assert(__sanitizer_verify_contiguous_container(c.data(), c.data() + c.size() + 1, c.data() + c.capacity() + 1) !=
+           0);
+    volatile T foo = c[c.size() + 1]; // should trigger ASAN. Use volatile to prevent being optimized away.
+    assert(false);                    // if we got here, ASAN didn't trigger
+    ((void)foo);
+  }
+
+  return 0;
+}
diff --git a/libcxx/test/libcxx/containers/strings/basic.string/asan_vector_integration.pass.cpp b/libcxx/test/libcxx/containers/strings/basic.string/asan_vector_integration.pass.cpp
new file mode 100644
index 000000000000000..5b1900fb00d5bb3
--- /dev/null
+++ b/libcxx/test/libcxx/containers/strings/basic.string/asan_vector_integration.pass.cpp
@@ -0,0 +1,182 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// REQUIRES: asan
+// UNSUPPORTED: c++03
+
+#include <cassert>
+#include <string>
+#include <vector>
+#include <array>
+#include "test_macros.h"
+#include "asan_testing.h"
+#include "min_allocator.h"
+
+// This tests exists to check if strings work well with vector, as those
+// may be partialy annotated, we cannot simply call
+// is_contiguous_container_asan_correct, as it assumes that
+// object memory inside is not annotated, so we check everything in a more careful way.
+
+template <typename D>
+void verify_inside(D const& d) {
+  for (size_t i = 0; i < d.size(); ++i) {
+    assert(is_string_asan_correct(d[i]));
+  }
+}
+
+template <typename S, size_t N>
+S get_s(char c) {
+  S s;
+  for (size_t i = 0; i < N; ++i)
+    s.push_back(c);
+
+  return s;
+}
+
+template <class C, class S>
+void test_string() {
+  size_t const N = sizeof(S) < 256 ? (4096 / sizeof(S)) : 16;
+
+  {
+    C d1a(1), d1b(N), d1c(N + 1), d1d(32 * N);
+    verify_inside(d1a);
+    verify_inside(d1b);
+    verify_inside(d1c);
+    verify_inside(d1d);
+  }
+  {
+    C d2;
+    for (size_t i = 0; i < 16 * N; ++i) {
+      d2.push_back(get_s<S, 1>(i % 10 + 'a'));
+      verify_inside(d2);
+      d2.push_back(get_s<S, 222>(i % 10 + 'b'));
+      verify_inside(d2);
+
+      d2.erase(d2.cbegin());
+      verify_inside(d2);
+    }
+  }
+  {
+    C d3;
+    for (size_t i = 0; i < 16 * N; ++i) {
+      d3.push_back(get_s<S, 1>(i % 10 + 'a'));
+      verify_inside(d3);
+      d3.push_back(get_s<S, 222>(i % 10 + 'b'));
+      verify_inside(d3);
+
+      d3.pop_back();
+      verify_inside(d3);
+    }
+  }
+  {
+    C d4;
+    for (size_t i = 0; i < 16 * N; ++i) {
+      // When there is no SSO, all elements inside should not be poisoned,
+      // so we can verify vector poisoning.
+      d4.push_back(get_s<S, 333>(i % 10 + 'a'));
+      verify_inside(d4);
+      assert(is_contiguous_container_asan_correct(d4));
+      d4.push_back(get_s<S, 222>(i % 10 + 'b'));
+      verify_inside(d4);
+      assert(is_contiguous_container_asan_correct(d4));
+    }
+  }
+  {
+    C d5;
+    for (size_t i = 0; i < 5 * N; ++i) {
+      // In d4 we never had poisoned memory inside vector.
+      // Here we start with SSO, so part of the inside of the container,
+      // will be poisoned.
+      d5.push_back(S());
+      verify_inside(d5);
+    }
+    for (size_t i = 0; i < d5.size(); ++i) {
+      // We change the size to have long string.
+      // Memory owne by vector should not be poisoned by string.
+      d5[i].resize(1000);
+      verify_inside(d5);
+    }
+
+    assert(is_contiguous_container_asan_correct(d5));
+
+    d5.erase(d5.begin() + 2);
+    verify_inside(d5);
+
+    d5.erase(d5.end() - 2);
+    verify_inside(d5);
+
+    assert(is_contiguous_container_asan_correct(d5));
+  }
+  {
+    C d6a;
+    assert(is_contiguous_container_asan_correct(d6a));
+
+    C d6b(N + 2, get_s<S, 1000>('a'));
+    d6b.push_back(get_s<S, 1001>('b'));
+    while (!d6b.empty()) {
+      d6b.pop_back();
+      assert(is_contiguous_container_asan_correct(d6b));
+    }
+
+    C d6c(N + 2, get_s<S, 1002>('c'));
+    while (!d6c.empty()) {
+      d6c.pop_back();
+      assert(is_contiguous_container_asan_correct(d6c));
+    }
+  }
+  {
+    C d7(9 * N + 2);
+
+    d7.insert(d7.begin() + 1, S());
+    verify_inside(d7);
+
+    d7.insert(d7.end() - 3, S());
+    verify_inside(d7);
+
+    d7.insert(d7.begin() + 2 * N, get_s<S, 1>('a'));
+    verify_inside(d7);
+
+    d7.insert(d7.end() - 2 * N, get_s<S, 1>('b'));
+    verify_inside(d7);
+
+    d7.insert(d7.begin() + 2 * N, 3 * N, get_s<S, 1>('c'));
+    verify_inside(d7);
+
+    // It may not be short for big element types, but it will be checked correctly:
+    d7.insert(d7.end() - 2 * N, 3 * N, get_s<S, 2>('d'));
+    verify_inside(d7);
+
+    d7.erase(d7.begin() + 2);
+    verify_inside(d7);
+
+    d7.erase(d7.end() - 2);
+    verify_inside(d7);
+  }
+}
+
+template <class S>
+void test_container() {
+  test_string<std::vector<S, std::allocator<S>>, S>();
+  test_string<std::vector<S, min_allocator<S>>, S>();
+  test_string<std::vector<S, safe_allocator<S>>, S>();
+}
+
+int main(int, char**) {
+  // Those tests support only types based on std::basic_string.
+  test_container<std::string>();
+  test_container<std::wstring>();
+#if TEST_STD_VER >= 11
+  test_container<std::u16string>();
+  test_container<std::u32string>();
+#endif
+#if TEST_STD_VER >= 20
+  test_container<std::u8string>();
+#endif
+
+  return 0;
+}
diff --git a/libcxx/test/support/asan_testing.h b/libcxx/test/support/asan_testing.h
index 6bfc8280a4ead30..3785c1f9c20dea1 100644
--- a/libcxx/test/support/asan_testing.h
+++ b/libcxx/test/support/asan_testing.h
@@ -56,35 +56,16 @@ TEST_CONSTEXPR bool is_double_ended_contiguous_container_asan_correct(const std:
 #endif
 
 #if TEST_HAS_FEATURE(address_sanitizer)
-template <typename S>
-bool is_string_short(S const& s) {
-  // We do not have access to __is_long(), but we can check if strings
-  // buffer is inside strings memory. If strings memory contains its content,
-  // SSO is in use. To check it, we can just confirm that the beginning is in
-  // the string object memory block.
-  // &s    - beginning of objects memory
-  // &s[0] - beginning of the buffer
-  // (&s+1) - end of objects memory
-  return (void*)std::addressof(s) <= (void*)std::addressof(s[0]) &&
-         (void*)std::addressof(s[0]) < (void*)(std::addressof(s) + 1);
-}
-
 template <typename ChrT, typename TraitsT, typename Alloc>
 TEST_CONSTEXPR bool is_string_asan_correct(const std::basic_string<ChrT, TraitsT, Alloc>& c) {
   if (TEST_IS_CONSTANT_EVALUATED)
     return true;
 
-  if (!is_string_short(c) || _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED) {
-    if (std::__asan_annotate_container_with_allocator<Alloc>::value)
-      return __sanitizer_verify_contiguous_container(c.data(), c.data() + c.size() + 1, c.data() + c.capacity() + 1) !=
-             0;
-    else
-      return __sanitizer_verify_contiguous_container(
-                 c.data(), c.data() + c.capacity() + 1, c.data() + c.capacity() + 1) != 0;
-  } else {
-    return __sanitizer_verify_contiguous_container(std::addressof(c), std::addressof(c) + 1, std::addressof(c) + 1) !=
-           0;
-  }
+  if (std::__asan_annotate_container_with_allocator<Alloc>::value)
+    return __sanitizer_verify_contiguous_container(c.data(), c.data() + c.size() + 1, c.data() + c.capacity() + 1) != 0;
+  else
+    return __sanitizer_verify_contiguous_container(
+               c.data(), c.data() + c.capacity() + 1, c.data() + c.capacity() + 1) != 0;
 }
 #else
 #  include <string>
diff --git a/llvm/include/llvm/ADT/FunctionExtras.h b/llvm/include/llvm/ADT/FunctionExtras.h
index 4cf1de488c7bde2..9d9551dd92f4d73 100644
--- a/llvm/include/llvm/ADT/FunctionExtras.h
+++ b/llvm/include/llvm/ADT/FunctionExtras.h
@@ -319,6 +319,9 @@ template <typename ReturnT, typename... ParamTs> class UniqueFunctionBase {
 
 #ifndef NDEBUG
     // In debug builds, we also scribble across the rest of the storage.
+#ifndef _LIBCPP_HAS_NO_ASAN
+    __asan_unpoison_memory_region(RHS.getInlineStorage(), InlineStorageSize);
+#endif
     memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
 #endif
   }
diff --git a/llvm/include/llvm/Support/JSON.h b/llvm/include/llvm/Support/JSON.h
index a81881c52d6c960..2cce5a084a34f6e 100644
--- a/llvm/include/llvm/Support/JSON.h
+++ b/llvm/include/llvm/Support/JSON.h
@@ -482,6 +482,9 @@ class Value {
   friend class Object;
 
   template <typename T, typename... U> void create(U &&... V) {
+#ifndef _LIBCPP_HAS_NO_ASAN
+    __asan_unpoison_memory_region(&Union, sizeof(T));
+#endif
     new (reinterpret_cast<T *>(&Union)) T(std::forward<U>(V)...);
   }
   template <typename T> T &as() const {

@AdvenamTacet
Copy link
Member Author

Related branch (to make testing buildbots before upstreaming possible): https://github.com/llvm/llvm-project/commits/users/AdvenamTacet/short-string-annotations-v2-testing/

I will remove it after this PR is merged.

@aeubanks
Copy link
Contributor

the llvm changes should probably be landed separately. also we should just not do the scribbling when building with asan?

@AdvenamTacet
Copy link
Member Author

the llvm changes should probably be landed separately.

Those changes are strongly related to that PR and don't affect behavior there (only affect ASan annotations), therefore I included them in one PR, but I can open a separate one soon (and move there this one commit).

we should just not do the scribbling when building with asan?

I was also thinking about it, I didn't want to change behavior of the code, therefore I just added unpoisoning, but I'm ok with both versions.

@vitalybuka
Copy link
Collaborator

the llvm changes should probably be landed separately.

Those changes are strongly related to that PR and don't affect behavior there (only affect ASan annotations), therefore I included them in one PR, but I can open a separate one soon (and move there this one commit).

To me they are dependency of the main PR, but don't need to be here. so +1 for a separate PR

@vitalybuka
Copy link
Collaborator

the llvm changes should probably be landed separately.

Those changes are strongly related to that PR and don't affect behavior there (only affect ASan annotations), therefore I included them in one PR, but I can open a separate one soon (and move there this one commit).

To me they are dependency of the main PR, but don't need to be here. so +1 for a separate PR

Actually two PRs, JSON and ADT are also intependent.

AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Jan 22, 2024
AddressSanitizer (ASAN) disables scribbling to prevent overwriting poisoned objects.
Needed by llvm#79049
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Jan 22, 2024
This commit unpoisons memory before its reuse (with reinterpret_cast).
Required by llvm#79049
This commit lowers values in `std::vector` integration tests to as good as previous ones, but faster.
One test caused a problem with buildbots: https://lab.llvm.org/buildbot/#/builders/168/builds/18126/steps/11/logs/stdio
@AdvenamTacet
Copy link
Member Author

I just moved this one commit into two new PRs, thank you both for feedback! PRs:

Didn't have time to test it locally before pushing.

@vitalybuka my users/ branch has all those changes.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM but please land the LLVM patches first to avoid breaking the bots. Let's try to land everything ASAP, ideally before we branch LLVM 18 (which happens on January 23rd).

AdvenamTacet pushed a commit that referenced this pull request Jan 23, 2024
With this commit, scribbling under AddressSanitizer (ASan) is disabled to prevent  overwriting poisoned objects (e.g., annotated short strings).
Needed by #79049
AdvenamTacet pushed a commit that referenced this pull request Jan 23, 2024
This commit unpoisons memory before its reuse (with reinterpret_cast).
Required by #79049

Notice that it's a temporary solution to prevent buildbots from failing.
Read FIXME for details.
@AdvenamTacet AdvenamTacet merged commit cb528ec into llvm:main Jan 23, 2024
43 checks passed
@AdvenamTacet AdvenamTacet deleted the short-string-annotations-v2 branch January 23, 2024 18:19
thurstond added a commit that referenced this pull request Jan 23, 2024
…9049)"

This reverts commit cb528ec.

Reason: buildbot breakage (https://lab.llvm.org/buildbot/#/builders/5/builds/40364):
SUMMARY: AddressSanitizer: container-overflow /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan_ubsan/include/c++/v1/string:1870:29 in __get_long_pointer
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Jan 25, 2024
This commit makes two variables static.
That makes two buildbot tests pass with short string annotations.

Short string annotations PR (reverted):
- llvm#79049

Tests fixed with this PR:
``
  LLVM :: Transforms/Inline/cgscc-inline-replay.ll
  LLVM :: Transforms/SampleProfile/inline-replay.ll
```
Buildbot output: https://lab.llvm.org/buildbot/#/builders/5/builds/40364/steps/9/logs/stdio

This PR does not resolve a problem with `Clang :: SemaCXX/builtins.cpp`.

I suspect that there may be use after end of life bug and it's fixed by this change.
AdvenamTacet pushed a commit that referenced this pull request Jan 25, 2024
This commit makes two variables static.
That makes two buildbot tests pass with short string annotations.

Short string annotations PR (reverted):
- #79049

Tests fixed with this PR:
``
  LLVM :: Transforms/Inline/cgscc-inline-replay.ll
  LLVM :: Transforms/SampleProfile/inline-replay.ll
```
Buildbot output: https://lab.llvm.org/buildbot/#/builders/5/builds/40364/steps/9/logs/stdio

This PR does not resolve a problem with `Clang :: SemaCXX/builtins.cpp`.

I suspect that there may be use after end of life bug and it's fixed by this change.
AdvenamTacet pushed a commit that referenced this pull request Jan 25, 2024
Originally merged here: #75882
Reverted here: #78627

Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.

Fixes are implemented in:
- #79065
- #79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
#ifndef NDEBUG
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
#endif
```
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Jan 25, 2024
This commit makes two variables static extending their life span.
This patch is designed to address the issue of buildbots failing when AddressSanitizer's (ASan) short string annotations are enabled.
It's esentially same as:
- llvm#79489
however, it's less likely to solve the real problem as those strings change (aren't `const`).
I suspect that there may be use after end of life bug (in StringRef), but it requires confirmation.
In that case, one alternative solution, which unfortunately results in memory leaks, is to always allocate new strings instead of overwriting existing (static) ones. This approach would prevent potential data corruption, but I don't suggest it in this PR.

This patch makes `Clang :: SemaCXX/builtins.cpp` test pass with short string annotations (ASan).
With llvm#79489 it fixes known problems with buildbots, while running with short string annotations.
However, the potential issue still requires more investigation therefore FIXME comment is added in that patch.

Short string annotations PR (reverted):
- llvm#79049

Buildbots (failure) output:
- https://lab.llvm.org/buildbot/#/builders/5/builds/40364/steps/9/logs/stdio

While buildbots should not fail with proposed changes, we still should investigate why buildbots were failing with ASan short string annotations turned on.
StringRef objects (made from those strings) can potentially change their contents unexpectedly or even (potentially) use of freed memory may happen.
That interpretation is only my educated guess, I still didn't understand exactly why those buildbots are failing.
AdvenamTacet pushed a commit that referenced this pull request Jan 26, 2024
This commit makes two variables static.
That makes two buildbot tests pass with short string annotations.
I suspect that there may be use after end of life bug and it's fixed by
this change, but it requires confirmation.

Short string annotations PR (reverted):
- #79049

Tests fixed with this PR:
```
  LLVM :: Transforms/Inline/cgscc-inline-replay.ll 
  LLVM :: Transforms/SampleProfile/inline-replay.ll
```
Buildbot output:
https://lab.llvm.org/buildbot/#/builders/5/builds/40364/steps/9/logs/stdio

This PR does not resolve a problem with `Clang :: SemaCXX/builtins.cpp`,
related PR is:
- #79522
AdvenamTacet pushed a commit that referenced this pull request Jan 26, 2024
This commit makes two variables static extending their life span.
This patch is designed to address the issue of buildbots failing when AddressSanitizer's (ASan) short string annotations are enabled.
It's esentially same as:
- #79489
however, it's less likely to solve the real problem as those strings change (aren't `const`).
I suspect that there may be use after end of life bug (in StringRef), but it requires confirmation.
In that case, one alternative solution, which unfortunately results in memory leaks, is to always allocate new strings instead of overwriting existing (static) ones. This approach would prevent potential data corruption, but I don't suggest it in this PR.

This patch makes `Clang :: SemaCXX/builtins.cpp` test pass with short string annotations (ASan).
With #79489 it fixes known problems with buildbots, while running with short string annotations.
However, the potential issue still requires more investigation therefore FIXME comment is added in that patch.

Short string annotations PR (reverted):
- #79049

Buildbots (failure) output:
- https://lab.llvm.org/buildbot/#/builders/5/builds/40364/steps/9/logs/stdio

While buildbots should not fail with proposed changes, we still should investigate why buildbots were failing with ASan short string annotations turned on.
StringRef objects (made from those strings) can potentially change their contents unexpectedly or even (potentially) use of freed memory may happen.
That interpretation is only my educated guess, I still didn't understand exactly why those buildbots are failing.
AdvenamTacet pushed a commit that referenced this pull request Jan 26, 2024
Originally merged here: #75882
Reverted here: #78627

Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.

Fixes are implemented in:
- #79065
- #79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```
@AdvenamTacet
Copy link
Member Author

Just pointing out that this PR was reverted in commit a16f81f

Commit message:

This reverts commit cb528ec.

Reason: buildbot breakage (https://lab.llvm.org/buildbot/#/builders/5/builds/40364):
SUMMARY: AddressSanitizer: container-overflow /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan_ubsan/include/c++/v1/string:1870:29 in __get_long_pointer

Next PR is here: #79536 and is waiting for us to understand the problem fully.

AdvenamTacet pushed a commit that referenced this pull request May 7, 2024
This pull request is the third iteration aiming to integrate short
string annotations. This commit includes:
- Enabling basic_string annotations for short strings.
- Setting a value of `__trivially_relocatable` in `std::basic_string` to
`false_type` when compiling with ASan (nothing changes when compiling
without ASan). Short string annotations make `std::basic_string` to not
be trivially relocatable, because memory has to be unpoisoned.
- Adding a `_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS` modifier to two
functions.
- Creating a macro `_LIBCPP_ASAN_VOLATILE_WRAPPER` to prevent
problematic stack optimizations (the macro modifies code behavior only
when compiling with ASan).

Previously we had issues with compiler optimization, which we understand
thanks to @vitalybuka. This commit also addresses smaller changes in
short string, since previous upstream attempts.

Problematic optimization was loading two values in code similar to:
```
__is_long() ? __get_long_size() : __get_short_size();
```
We aim to resolve it with the volatile wrapper.

This commit is built on top of two previous attempts which descriptions
are below.

Additionally, in the meantime, annotations were updated (but it
shouldn't have any impact on anything):
- #79292

---

Previous PR: #79049
Reverted:
a16f81f

Previous description:

Originally merged here: #75882
Reverted here: #78627

Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.

Fixes are implemented in:
- #79065
- #79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
#72677

Requires to pass CI without fails:
- #75845
- #75858

Annotating `std::basic_string` with default allocator is implemented in
#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

- [email protected]
- [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants