Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++][string] Remove potential non-trailing 0-length array #105865

Merged

Conversation

serge-sans-paille
Copy link
Collaborator

It is a violation of the standard to use 0 length arrays, especially when not at the end of a structure (not a FAM GNU extension). Compiler generally accept it, but it's probably better to have a conforming implementation.

@serge-sans-paille serge-sans-paille requested a review from a team as a code owner August 23, 2024 18:02
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Aug 23, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Aug 23, 2024

@llvm/pr-subscribers-libcxx

Author: None (serge-sans-paille)

Changes

It is a violation of the standard to use 0 length arrays, especially when not at the end of a structure (not a FAM GNU extension). Compiler generally accept it, but it's probably better to have a conforming implementation.


Full diff: https://github.com/llvm/llvm-project/pull/105865.diff

1 Files Affected:

  • (modified) libcxx/include/string (+36-15)
diff --git a/libcxx/include/string b/libcxx/include/string
index 6e93a6230cc2c0..5c1954e384e7fe 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -748,6 +748,41 @@ struct __can_be_converted_to_string_view
 struct __uninitialized_size_tag {};
 struct __init_with_sentinel_tag {};
 
+#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
+template<class _T, size_t __min_cap, size_t _Padding=sizeof(_T) - 1>
+struct __short_impl {
+  _T __data_[__min_cap];
+  unsigned char __padding_[_Padding];
+  unsigned char __size_    : 7;
+  unsigned char __is_long_ : 1;
+};
+
+template<class _T, size_t __min_cap>
+struct __short_impl<_T, __min_cap, 0> {
+  value_type __data_[__min_cap];
+  unsigned char __size_    : 7;
+  unsigned char __is_long_ : 1;
+};
+#else
+template<class _T, size_t __min_cap, size_t _Padding=sizeof(_T) - 1>
+struct __short_impl {
+  struct _LIBCPP_PACKED {
+    unsigned char __is_long_ : 1;
+    unsigned char __size_    : 7;
+  };
+  char __padding_[_Padding];
+  _T __data_[__min_cap];
+};
+template<class _T, size_t __min_cap>
+struct __short_impl<_T, __min_cap, 0> {
+  struct _LIBCPP_PACKED {
+    unsigned char __is_long_ : 1;
+    unsigned char __size_    : 7;
+  };
+  _T __data_[__min_cap];
+};
+#endif
+
 template <class _CharT, class _Traits, class _Allocator>
 class basic_string {
 private:
@@ -850,13 +885,6 @@ private:
 
   enum { __min_cap = (sizeof(__long) - 1) / sizeof(value_type) > 2 ? (sizeof(__long) - 1) / sizeof(value_type) : 2 };
 
-  struct __short {
-    value_type __data_[__min_cap];
-    unsigned char __padding_[sizeof(value_type) - 1];
-    unsigned char __size_    : 7;
-    unsigned char __is_long_ : 1;
-  };
-
   // The __endian_factor is required because the field we use to store the size
   // has one fewer bit than it would if it were not a bitfield.
   //
@@ -899,17 +927,10 @@ private:
 
   enum { __min_cap = (sizeof(__long) - 1) / sizeof(value_type) > 2 ? (sizeof(__long) - 1) / sizeof(value_type) : 2 };
 
-  struct __short {
-    struct _LIBCPP_PACKED {
-      unsigned char __is_long_ : 1;
-      unsigned char __size_    : 7;
-    };
-    char __padding_[sizeof(value_type) - 1];
-    value_type __data_[__min_cap];
-  };
 
 #endif // _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
 
+  using __short = __short_impl<value_type, __min_cap>;
   static_assert(sizeof(__short) == (sizeof(value_type) * (__min_cap + 1)), "__short has an unexpected size.");
 
   union __rep {

Copy link

github-actions bot commented Aug 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@serge-sans-paille serge-sans-paille force-pushed the feature/invalid-fam-libcxx branch 2 times, most recently from 8e06f53 to 98e891f Compare August 23, 2024 18:23
@serge-sans-paille
Copy link
Collaborator Author

The build error seems unrelated (?)

@philnik777
Copy link
Contributor

It is a violation of the standard to use 0 length arrays, especially when not at the end of a structure (not a FAM GNU extension). Compiler generally accept it, but it's probably better to have a conforming implementation.

It's an extension that compilers currently support, so I don't really see why the new implementation is better. As-is this is a significant additional amount of complexity without much of a downside (other than using a questionable extension, which is purely philosophical).


template <class _CharT, size_t __min_cap>
struct __short_impl<_CharT, __min_cap, 0> {
value_type __data_[__min_cap];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value_type is not declared here. It should be _CharT.

Suggested change
value_type __data_[__min_cap];
_CharT __data_[__min_cap];

@frederick-vs-ja
Copy link
Contributor

It's an extension that compilers currently support, so I don't really see why the new implementation is better.

Perhaps the change will make -Wzero-length-array happy. #74183 seems related.

@serge-sans-paille
Copy link
Collaborator Author

It's an extension that compilers currently support, so I don't really see why the new implementation is better.

Perhaps the change will make -Wzero-length-array happy. #74183 seems related.

Exactly the issue I ran into.

@philnik777
Copy link
Contributor

It's an extension that compilers currently support, so I don't really see why the new implementation is better.

Perhaps the change will make -Wzero-length-array happy. #74183 seems related.

Exactly the issue I ran into.

How exactly does one run into a diagnostic in a system header?

@serge-sans-paille
Copy link
Collaborator Author

It's an extension that compilers currently support, so I don't really see why the new implementation is better.

Perhaps the change will make -Wzero-length-array happy. #74183 seems related.

Exactly the issue I ran into.

How exactly does one run into a diagnostic in a system header?

I'm working on harmonizing the clang behavior wrt. FMA, and while doing this I turned on FMA warnings when compiling the full llvm-project, looks like it impacted libcxx too.

@philnik777
Copy link
Contributor

I'm working on harmonizing the clang behavior wrt. FMA, and while doing this I turned on FMA warnings when compiling the full llvm-project, looks like it impacted libcxx too.

You're talking about fused-multiply-add, right? Or is there another FMA? What has FMA got to do with anything here?

@serge-sans-paille
Copy link
Collaborator Author

I'm working on harmonizing the clang behavior wrt. FMA, and while doing this I turned on FMA warnings when compiling the full llvm-project, looks like it impacted libcxx too.

You're talking about fused-multiply-add, right? Or is there another FMA? What has FMA got to do with anything here?

s/FMA/FAM/ ^^!

@serge-sans-paille
Copy link
Collaborator Author

Gentle ping :-)
There are a few other situations where we use zero length arrays in libc++, and this seem to help for compatibility with msvc as pointed out in #74183

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions and comments, but this generally LGTM. While I wouldn't go absolutely nuts to avoid using a compiler extension, I think it's better to strive towards using few-ish of those. I also think this is a good opportunity to do a small refactoring of std::string's layout definition that has been on my mind for a while.

@philnik777 Please let me know if you feel strongly that this patch shouldn't move forward. I am weakly favourable to it, but I could also be convinced that the complexity isn't worth it.

#else
template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1>
struct __short_impl {
struct _LIBCPP_PACKED {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing but I don't understand it: this struct has no name, is that really intended?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's intended. It's required to keep the ABI on AIX stable.

@@ -748,6 +748,41 @@ struct __can_be_converted_to_string_view
struct __uninitialized_size_tag {};
struct __init_with_sentinel_tag {};

#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1>
struct __short_impl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to do this:

// always define both of these classes
template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1>
struct __short_layout_alternate { ... };

template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1>
struct __short_layout_classic { ... };

// now inside basic_string
#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
using __short = __short_layout_alternate <value_type, __min_cap>;
#else
using __short = __short_layout_classic <value_type, __min_cap>;
#endif

This reduces the scope of the conditional code and IMO makes things easier to understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the patch accordingly, with a guard on the individual types too.

Copy link
Contributor

@philnik777 philnik777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions and comments, but this generally LGTM. While I wouldn't go absolutely nuts to avoid using a compiler extension, I think it's better to strive towards using few-ish of those. I also think this is a good opportunity to do a small refactoring of std::string's layout definition that has been on my mind for a while.

@philnik777 Please let me know if you feel strongly that this patch shouldn't move forward. I am weakly favourable to it, but I could also be convinced that the complexity isn't worth it.

The main reason I've been pushing back here is because I think there is a significantly simpler alternative and I don't think it's exactly urgent to remove the use of this extension. I know that the clang folks are thinking about removing it (since it is indeed heinous), but I'm not aware of anybody actively working on that at the moment.

Specifically, we could do

template <size_t _PaddingSize>
struct __padding {
  char __padding_[_PaddingSize];
};

template <>
struct __padding<0> {};

and then just have

struct __short {
  value_type __data_[__min_cap];
  _LIBCPP_NO_UNIQUE_ADDRESS __padding __padding_;
  unsigned char __size_ : 7;
  unsigned char __is_long_ : 1;
};

IMO this is significantly more readable.

But it requires the newer AppleClang.

#else
template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1>
struct __short_impl {
struct _LIBCPP_PACKED {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's intended. It's required to keep the ABI on AIX stable.

@serge-sans-paille serge-sans-paille force-pushed the feature/invalid-fam-libcxx branch 2 times, most recently from bd910ae to d5c8412 Compare August 29, 2024 08:49
@serge-sans-paille
Copy link
Collaborator Author

gentle ping: I've updated the patch according to @ldionne suggestion. I agree that @philnik777 solution is much cleaner, but requiring AppleClang seems to be a no-go (?)

@philnik777
Copy link
Contributor

We've actually update the AppleClang now, so my solution should™ work now.

@@ -748,6 +748,41 @@ struct __can_be_converted_to_string_view
struct __uninitialized_size_tag {};
struct __init_with_sentinel_tag {};

#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of the #ifdef here. This type and __short_layout_classic can both be defined regardless of which one we end up picking for the actual string representation (based on the single #ifdef inside the basic_string class).

template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1>
struct __short_layout_alternate {
_CharT __data_[__min_cap];
unsigned char __padding_[_Padding];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try out @philnik777 's suggestion for the padding? I think it should work with our current CI setup.

It is a violation of the standard to use 0 length arrays, especially
when not at the end of a structure (not a FAM GNU extension). Compiler
generally accept it, but it's probably better to have a conforming
implementation.

This patch relies on `no_unique_address` being unconditionally
available, which is the case since 42f5277.
@serge-sans-paille serge-sans-paille force-pushed the feature/invalid-fam-libcxx branch 2 times, most recently from 8d4bc53 to 7cbb05f Compare September 8, 2024 10:09
mysterymath added a commit that referenced this pull request Sep 10, 2024
Michael137 added a commit to Michael137/llvm-project that referenced this pull request Sep 11, 2024
This is the root-cause for the LLDB failures that started
occurring after llvm#105865.

The DWARFASTParserClang has logic to try derive unnamed bitfields
from DWARF offsets. In this case we treat `padding` as a 1-byte
size field that would overlap with `flag`, and decide we need to
introduce an unnamed bitfield into the AST, which is incorrect.
@Michael137
Copy link
Member

Michael137 commented Sep 11, 2024

Here's the reduced issue: #108155

I'm working on a fix. But to get CI back in shape, are you ok with reverting this in the meantime @serge-sans-paille? Apologies for the disruption

EDIT: ah looks like it was already reverted, I'll update this thread once the LLDB issue is fixed. The fallout from this was that we could no longer view std::strings (shockingly that was never actually tested in the data-formatters).

Michael137 added a commit that referenced this pull request Sep 11, 2024
…luator

This would've caught the failures in
#105865 in the libc++
data-formatter CI.
Michael137 added a commit that referenced this pull request Sep 11, 2024
…#108155)

This is the root-cause for the LLDB failures that started occurring
after #105865.

The DWARFASTParserClang has logic to try derive unnamed bitfields from
DWARF offsets. In this case we treat `padding` as a 1-byte size field
that would overlap with `flag`, and decide we need to introduce an
unnamed bitfield into the AST, which is incorrect.
@ldionne
Copy link
Member

ldionne commented Sep 11, 2024

@mysterymath Can you please describe what this broke in the Fuchsia toolchain CI? Was it the data formatters?

Edit: NVM, I saw your comment on the revert PR and I see that it was indeed the data formatters.

@ldionne
Copy link
Member

ldionne commented Sep 11, 2024

@Michael137 IIUC we'd be fine to re-submit now, is that right?

@Michael137
Copy link
Member

@Michael137 IIUC we'd be fine to re-submit now, is that right?

Not quite. I have a plan but havent put up a PR yet. Hopefully we're good by EOD tomorrow. I'll ping once I landed the fix

VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
…05865)

It is a violation of the standard to use 0 length arrays, especially
when not at the end of a structure (not a FAM GNU extension). Compiler
generally accept it, but it's probably better to have a conforming
implementation.

---------

Co-authored-by: Louis Dionne <[email protected]>
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
…luator

This would've caught the failures in
llvm#105865 in the libc++
data-formatter CI.
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Sep 12, 2024
…llvm#108155)

This is the root-cause for the LLDB failures that started occurring
after llvm#105865.

The DWARFASTParserClang has logic to try derive unnamed bitfields from
DWARF offsets. In this case we treat `padding` as a 1-byte size field
that would overlap with `flag`, and decide we need to introduce an
unnamed bitfield into the AST, which is incorrect.
@Michael137
Copy link
Member

PR is up: #108343

Michael137 added a commit to Michael137/llvm-project that referenced this pull request Sep 12, 2024
Michael137 added a commit to Michael137/llvm-project that referenced this pull request Sep 12, 2024
Michael137 added a commit that referenced this pull request Sep 12, 2024
…presence of overlapping fields (#108343)

This bug surfaced after #105865
(currently reverted, but blocked on this to be relanded).

Because Clang doesn't emit `DW_TAG_member`s for unnamed bitfields, LLDB
has to make an educated guess about whether they existed in the source.
It does so by checking whether there is a gap between where the last
field ended and the currently parsed field starts. In the example test
case, the empty field `padding` was folded into the storage of `data`.
Because the `bit_offset` of `padding` is `0x0` and its `DW_AT_byte_size`
is `0x1`, LLDB thinks the field ends at `0x1` (not quite because we
first round the size to a word size, but this is an implementation
detail), erroneously deducing that there's a gap between `flag` and
`padding`.

This patch adds the notion of "effective field end", which accounts for
fields that share storage. It is set to the end of the storage that the
two fields occupy. Then we use this to check for gaps in the unnamed
bitfield creation logic.
Michael137 added a commit to Michael137/llvm-project that referenced this pull request Sep 12, 2024
@Michael137
Copy link
Member

Michael137 commented Sep 12, 2024

Just merged the LLDB patch. Feel free to reland (also added a test for this that should get run as part of the libc++ pre-merge CI)

Michael137 added a commit to Michael137/llvm-project that referenced this pull request Sep 13, 2024
ldionne pushed a commit to ldionne/llvm-project that referenced this pull request Sep 16, 2024
It is a violation of the standard to use 0 length arrays, especially
when not at the end of a structure (not a FAM GNU extension). Compiler
generally accept it, but it's probably better to have a conforming
implementation.

This is a re-application of llvm#105865 which was reverted in 72cfc74
because it broke the data formatters. A LLDB patch has since been
landed that should make this a non-issue.
ldionne added a commit that referenced this pull request Sep 16, 2024
It is a violation of the standard to use 0 length arrays, especially
when not at the end of a structure (not a FAM GNU extension). Compiler
generally accept it, but it's probably better to have a conforming
implementation.

This is a re-application of #105865 which was reverted in 72cfc74
because it broke the data formatters. A LLDB patch has since been landed
that should make this a non-issue.

Co-authored-by: serge-sans-paille <[email protected]>
Michael137 added a commit to Michael137/llvm-project that referenced this pull request Oct 3, 2024
Michael137 added a commit that referenced this pull request Oct 3, 2024
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants