Skip to content

Commit

Permalink
[ASan][libc++] Turn on ASan annotations for short strings (#79536)
Browse files Browse the repository at this point in the history
This pull request is the third iteration aiming to integrate short
string annotations. This commit includes:
- Enabling basic_string annotations for short strings.
- Setting a value of `__trivially_relocatable` in `std::basic_string` to
`false_type` when compiling with ASan (nothing changes when compiling
without ASan). Short string annotations make `std::basic_string` to not
be trivially relocatable, because memory has to be unpoisoned.
- Adding a `_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS` modifier to two
functions.
- Creating a macro `_LIBCPP_ASAN_VOLATILE_WRAPPER` to prevent
problematic stack optimizations (the macro modifies code behavior only
when compiling with ASan).

Previously we had issues with compiler optimization, which we understand
thanks to @vitalybuka. This commit also addresses smaller changes in
short string, since previous upstream attempts.

Problematic optimization was loading two values in code similar to:
```
__is_long() ? __get_long_size() : __get_short_size();
```
We aim to resolve it with the volatile wrapper.

This commit is built on top of two previous attempts which descriptions
are below.

Additionally, in the meantime, annotations were updated (but it
shouldn't have any impact on anything):
- #79292

---

Previous PR: #79049
Reverted:
a16f81f

Previous description:

Originally merged here: #75882
Reverted here: #78627

Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.

Fixes are implemented in:
- #79065
- #79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
#72677

Requires to pass CI without fails:
- #75845
- #75858

Annotating `std::basic_string` with default allocator is implemented in
#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

- [email protected]
- [email protected]
  • Loading branch information
Tacet authored May 7, 2024
1 parent 7198b8a commit 1a96179
Show file tree
Hide file tree
Showing 6 changed files with 471 additions and 41 deletions.
60 changes: 44 additions & 16 deletions libcxx/include/string
Original file line number Diff line number Diff line change
Expand Up @@ -662,7 +662,6 @@ _LIBCPP_PUSH_MACROS
#else
# define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS
#endif
#define _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED false

_LIBCPP_BEGIN_NAMESPACE_STD

Expand Down Expand Up @@ -736,10 +735,44 @@ public:
//
// This string implementation doesn't contain any references into itself. It only contains a bit that says whether
// it is in small or large string mode, so the entire structure is trivially relocatable if its members are.
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
// When compiling with AddressSanitizer (ASan), basic_string cannot be trivially
// relocatable. Because the object's memory might be poisoned when its content
// is kept inside objects memory (short string optimization), instead of in allocated
// external memory. In such cases, the destructor is responsible for unpoisoning
// the memory to avoid triggering false positives.
// Therefore it's crucial to ensure the destructor is called
using __trivially_relocatable = false_type;
#else
using __trivially_relocatable = __conditional_t<
__libcpp_is_trivially_relocatable<allocator_type>::value && __libcpp_is_trivially_relocatable<pointer>::value,
basic_string,
void>;
#endif
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
pointer __asan_volatile_wrapper(pointer const &__ptr) const {
if (__libcpp_is_constant_evaluated())
return __ptr;

pointer volatile __copy_ptr = __ptr;

return const_cast<pointer &>(__copy_ptr);
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
const_pointer __asan_volatile_wrapper(const_pointer const &__ptr) const {
if (__libcpp_is_constant_evaluated())
return __ptr;

const_pointer volatile __copy_ptr = __ptr;

return const_cast<const_pointer &>(__copy_ptr);
}
#define _LIBCPP_ASAN_VOLATILE_WRAPPER(PTR) __asan_volatile_wrapper(PTR)
#else
#define _LIBCPP_ASAN_VOLATILE_WRAPPER(PTR) PTR
#endif

static_assert((!is_array<value_type>::value), "Character type of basic_string must not be an array");
static_assert((is_standard_layout<value_type>::value), "Character type of basic_string must be standard-layout");
Expand Down Expand Up @@ -1886,16 +1919,16 @@ private:
__r_.first().__l.__data_ = __p;
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 pointer __get_long_pointer() _NOEXCEPT {
return __r_.first().__l.__data_;
return _LIBCPP_ASAN_VOLATILE_WRAPPER(__r_.first().__l.__data_);
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 const_pointer __get_long_pointer() const _NOEXCEPT {
return __r_.first().__l.__data_;
return _LIBCPP_ASAN_VOLATILE_WRAPPER(__r_.first().__l.__data_);
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 pointer __get_short_pointer() _NOEXCEPT {
return pointer_traits<pointer>::pointer_to(__r_.first().__s.__data_[0]);
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS pointer __get_short_pointer() _NOEXCEPT {
return _LIBCPP_ASAN_VOLATILE_WRAPPER(pointer_traits<pointer>::pointer_to(__r_.first().__s.__data_[0]));
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 const_pointer __get_short_pointer() const _NOEXCEPT {
return pointer_traits<const_pointer>::pointer_to(__r_.first().__s.__data_[0]);
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS const_pointer __get_short_pointer() const _NOEXCEPT {
return _LIBCPP_ASAN_VOLATILE_WRAPPER(pointer_traits<const_pointer>::pointer_to(__r_.first().__s.__data_[0]));
}
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 pointer __get_pointer() _NOEXCEPT {
return __is_long() ? __get_long_pointer() : __get_short_pointer();
Expand All @@ -1914,38 +1947,33 @@ private:
#endif
}

// ASan: short string is poisoned if and only if this function returns true.
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 bool __asan_short_string_is_annotated() const _NOEXCEPT {
return _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED && !__libcpp_is_constant_evaluated();
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_new(size_type __current_size) const _NOEXCEPT {
(void)__current_size;
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + capacity() + 1, data() + __current_size + 1);
#endif
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_delete() const _NOEXCEPT {
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + size() + 1, data() + capacity() + 1);
#endif
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_increase(size_type __n) const _NOEXCEPT {
(void)__n;
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + size() + 1, data() + size() + 1 + __n);
#endif
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_shrink(size_type __old_size) const _NOEXCEPT {
(void)__old_size;
#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
if (!__libcpp_is_constant_evaluated() && (__asan_short_string_is_annotated() || __is_long()))
if (!__libcpp_is_constant_evaluated())
__annotate_contiguous_container(data() + __old_size + 1, data() + size() + 1);
#endif
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

// REQUIRES: asan
// UNSUPPORTED: c++03

#include <cassert>
#include <string>
#include <array>
#include <deque>
#include "test_macros.h"
#include "asan_testing.h"
#include "min_allocator.h"

// This tests exists to check if strings work well with deque, as those
// may be partialy annotated, we cannot simply call
// is_double_ended_contiguous_container_asan_correct, as it assumes that
// object memory inside is not annotated, so we check everything in a more careful way.

template <typename D>
void verify_inside(D const& d) {
for (size_t i = 0; i < d.size(); ++i) {
assert(is_string_asan_correct(d[i]));
}
}

template <typename S, size_t N>
S get_s(char c) {
S s;
for (size_t i = 0; i < N; ++i)
s.push_back(c);

return s;
}

template <class C, class S>
void test_string() {
size_t const N = sizeof(S) < 256 ? (4096 / sizeof(S)) : 16;

{
C d1a(1), d1b(N), d1c(N + 1), d1d(5 * N);
verify_inside(d1a);
verify_inside(d1b);
verify_inside(d1c);
verify_inside(d1d);
}
{
C d2;
for (size_t i = 0; i < 3 * N + 2; ++i) {
d2.push_back(get_s<S, 1>(i % 10 + 'a'));
verify_inside(d2);
d2.push_back(get_s<S, 22>(i % 10 + 'b'));
verify_inside(d2);

d2.pop_front();
verify_inside(d2);
}
}
{
C d3;
for (size_t i = 0; i < 3 * N + 2; ++i) {
d3.push_front(get_s<S, 1>(i % 10 + 'a'));
verify_inside(d3);
d3.push_front(get_s<S, 28>(i % 10 + 'b'));
verify_inside(d3);

d3.pop_back();
verify_inside(d3);
}
}
{
C d4;
for (size_t i = 0; i < 3 * N + 2; ++i) {
// When there is no SSO, all elements inside should not be poisoned,
// so we can verify deque poisoning.
d4.push_front(get_s<S, 33>(i % 10 + 'a'));
verify_inside(d4);
assert(is_double_ended_contiguous_container_asan_correct(d4));
d4.push_back(get_s<S, 28>(i % 10 + 'b'));
verify_inside(d4);
assert(is_double_ended_contiguous_container_asan_correct(d4));
}
}
{
C d5;
for (size_t i = 0; i < 3 * N + 2; ++i) {
// In d4 we never had poisoned memory inside deque.
// Here we start with SSO, so part of the inside of the container,
// will be poisoned.
d5.push_front(S());
verify_inside(d5);
}
for (size_t i = 0; i < d5.size(); ++i) {
// We change the size to have long string.
// Memory owne by deque should not be poisoned by string.
d5[i].resize(100);
verify_inside(d5);
}

assert(is_double_ended_contiguous_container_asan_correct(d5));

d5.erase(d5.begin() + 2);
verify_inside(d5);

d5.erase(d5.end() - 2);
verify_inside(d5);

assert(is_double_ended_contiguous_container_asan_correct(d5));
}
{
C d6a;
assert(is_double_ended_contiguous_container_asan_correct(d6a));

C d6b(N + 2, get_s<S, 100>('a'));
d6b.push_front(get_s<S, 101>('b'));
while (!d6b.empty()) {
d6b.pop_back();
assert(is_double_ended_contiguous_container_asan_correct(d6b));
}

C d6c(N + 2, get_s<S, 102>('c'));
while (!d6c.empty()) {
d6c.pop_back();
assert(is_double_ended_contiguous_container_asan_correct(d6c));
}
}
{
C d7(9 * N + 2);

d7.insert(d7.begin() + 1, S());
verify_inside(d7);

d7.insert(d7.end() - 3, S());
verify_inside(d7);

d7.insert(d7.begin() + 2 * N, get_s<S, 1>('a'));
verify_inside(d7);

d7.insert(d7.end() - 2 * N, get_s<S, 1>('b'));
verify_inside(d7);

d7.insert(d7.begin() + 2 * N, 3 * N, get_s<S, 1>('c'));
verify_inside(d7);

// It may not be short for big element types, but it will be checked correctly:
d7.insert(d7.end() - 2 * N, 3 * N, get_s<S, 2>('d'));
verify_inside(d7);

d7.erase(d7.begin() + 2);
verify_inside(d7);

d7.erase(d7.end() - 2);
verify_inside(d7);
}
}

template <class S>
void test_container() {
test_string<std::deque<S, std::allocator<S>>, S>();
test_string<std::deque<S, min_allocator<S>>, S>();
test_string<std::deque<S, safe_allocator<S>>, S>();
}

int main(int, char**) {
// Those tests support only types based on std::basic_string.
test_container<std::string>();
test_container<std::wstring>();
#if TEST_STD_VER >= 11
test_container<std::u16string>();
test_container<std::u32string>();
#endif
#if TEST_STD_VER >= 20
test_container<std::u8string>();
#endif

return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

// REQUIRES: asan
// UNSUPPORTED: c++03

// <string>

// Basic test if ASan annotations work for short strings.

#include <string>
#include <cassert>
#include <cstdlib>

#include "asan_testing.h"
#include "min_allocator.h"
#include "test_iterators.h"
#include "test_macros.h"

extern "C" void __sanitizer_set_death_callback(void (*callback)(void));

void do_exit() { exit(0); }

int main(int, char**) {
{
typedef cpp17_input_iterator<char*> MyInputIter;
// Should not trigger ASan.
std::basic_string<char, std::char_traits<char>, safe_allocator<char>> v;
char i[] = {'a', 'b', 'c', 'd'};

v.insert(v.begin(), MyInputIter(i), MyInputIter(i + 4));
assert(v[0] == 'a');
assert(is_string_asan_correct(v));
}

__sanitizer_set_death_callback(do_exit);
{
using T = char;
using C = std::basic_string<T, std::char_traits<T>, safe_allocator<T>>;
const T t[] = {'a', 'b', 'c', 'd', 'e', 'f', 'g'};
C c(std::begin(t), std::end(t));
assert(is_string_asan_correct(c));
assert(__sanitizer_verify_contiguous_container(c.data(), c.data() + c.size() + 1, c.data() + c.capacity() + 1) !=
0);
volatile T foo = c[c.size() + 1]; // should trigger ASAN. Use volatile to prevent being optimized away.
assert(false); // if we got here, ASAN didn't trigger
((void)foo);
}

return 0;
}
Loading

0 comments on commit 1a96179

Please sign in to comment.