-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++][string] Remove potential non-trailing 0-length array #105865
[libc++][string] Remove potential non-trailing 0-length array #105865
Conversation
@llvm/pr-subscribers-libcxx Author: None (serge-sans-paille) ChangesIt is a violation of the standard to use 0 length arrays, especially when not at the end of a structure (not a FAM GNU extension). Compiler generally accept it, but it's probably better to have a conforming implementation. Full diff: https://github.com/llvm/llvm-project/pull/105865.diff 1 Files Affected:
diff --git a/libcxx/include/string b/libcxx/include/string
index 6e93a6230cc2c0..5c1954e384e7fe 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -748,6 +748,41 @@ struct __can_be_converted_to_string_view
struct __uninitialized_size_tag {};
struct __init_with_sentinel_tag {};
+#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
+template<class _T, size_t __min_cap, size_t _Padding=sizeof(_T) - 1>
+struct __short_impl {
+ _T __data_[__min_cap];
+ unsigned char __padding_[_Padding];
+ unsigned char __size_ : 7;
+ unsigned char __is_long_ : 1;
+};
+
+template<class _T, size_t __min_cap>
+struct __short_impl<_T, __min_cap, 0> {
+ value_type __data_[__min_cap];
+ unsigned char __size_ : 7;
+ unsigned char __is_long_ : 1;
+};
+#else
+template<class _T, size_t __min_cap, size_t _Padding=sizeof(_T) - 1>
+struct __short_impl {
+ struct _LIBCPP_PACKED {
+ unsigned char __is_long_ : 1;
+ unsigned char __size_ : 7;
+ };
+ char __padding_[_Padding];
+ _T __data_[__min_cap];
+};
+template<class _T, size_t __min_cap>
+struct __short_impl<_T, __min_cap, 0> {
+ struct _LIBCPP_PACKED {
+ unsigned char __is_long_ : 1;
+ unsigned char __size_ : 7;
+ };
+ _T __data_[__min_cap];
+};
+#endif
+
template <class _CharT, class _Traits, class _Allocator>
class basic_string {
private:
@@ -850,13 +885,6 @@ private:
enum { __min_cap = (sizeof(__long) - 1) / sizeof(value_type) > 2 ? (sizeof(__long) - 1) / sizeof(value_type) : 2 };
- struct __short {
- value_type __data_[__min_cap];
- unsigned char __padding_[sizeof(value_type) - 1];
- unsigned char __size_ : 7;
- unsigned char __is_long_ : 1;
- };
-
// The __endian_factor is required because the field we use to store the size
// has one fewer bit than it would if it were not a bitfield.
//
@@ -899,17 +927,10 @@ private:
enum { __min_cap = (sizeof(__long) - 1) / sizeof(value_type) > 2 ? (sizeof(__long) - 1) / sizeof(value_type) : 2 };
- struct __short {
- struct _LIBCPP_PACKED {
- unsigned char __is_long_ : 1;
- unsigned char __size_ : 7;
- };
- char __padding_[sizeof(value_type) - 1];
- value_type __data_[__min_cap];
- };
#endif // _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
+ using __short = __short_impl<value_type, __min_cap>;
static_assert(sizeof(__short) == (sizeof(value_type) * (__min_cap + 1)), "__short has an unexpected size.");
union __rep {
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
8e06f53
to
98e891f
Compare
The build error seems unrelated (?) |
It's an extension that compilers currently support, so I don't really see why the new implementation is better. As-is this is a significant additional amount of complexity without much of a downside (other than using a questionable extension, which is purely philosophical). |
libcxx/include/string
Outdated
|
||
template <class _CharT, size_t __min_cap> | ||
struct __short_impl<_CharT, __min_cap, 0> { | ||
value_type __data_[__min_cap]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value_type
is not declared here. It should be _CharT
.
value_type __data_[__min_cap]; | |
_CharT __data_[__min_cap]; |
Perhaps the change will make |
Exactly the issue I ran into. |
How exactly does one run into a diagnostic in a system header? |
I'm working on harmonizing the clang behavior wrt. FMA, and while doing this I turned on FMA warnings when compiling the full llvm-project, looks like it impacted libcxx too. |
You're talking about fused-multiply-add, right? Or is there another FMA? What has FMA got to do with anything here? |
|
Gentle ping :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions and comments, but this generally LGTM. While I wouldn't go absolutely nuts to avoid using a compiler extension, I think it's better to strive towards using few-ish of those. I also think this is a good opportunity to do a small refactoring of std::string
's layout definition that has been on my mind for a while.
@philnik777 Please let me know if you feel strongly that this patch shouldn't move forward. I am weakly favourable to it, but I could also be convinced that the complexity isn't worth it.
libcxx/include/string
Outdated
#else | ||
template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1> | ||
struct __short_impl { | ||
struct _LIBCPP_PACKED { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-existing but I don't understand it: this struct has no name, is that really intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's intended. It's required to keep the ABI on AIX stable.
libcxx/include/string
Outdated
@@ -748,6 +748,41 @@ struct __can_be_converted_to_string_view | |||
struct __uninitialized_size_tag {}; | |||
struct __init_with_sentinel_tag {}; | |||
|
|||
#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT | |||
template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1> | |||
struct __short_impl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to do this:
// always define both of these classes
template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1>
struct __short_layout_alternate { ... };
template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1>
struct __short_layout_classic { ... };
// now inside basic_string
#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
using __short = __short_layout_alternate <value_type, __min_cap>;
#else
using __short = __short_layout_classic <value_type, __min_cap>;
#endif
This reduces the scope of the conditional code and IMO makes things easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the patch accordingly, with a guard on the individual types too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions and comments, but this generally LGTM. While I wouldn't go absolutely nuts to avoid using a compiler extension, I think it's better to strive towards using few-ish of those. I also think this is a good opportunity to do a small refactoring of
std::string
's layout definition that has been on my mind for a while.@philnik777 Please let me know if you feel strongly that this patch shouldn't move forward. I am weakly favourable to it, but I could also be convinced that the complexity isn't worth it.
The main reason I've been pushing back here is because I think there is a significantly simpler alternative and I don't think it's exactly urgent to remove the use of this extension. I know that the clang folks are thinking about removing it (since it is indeed heinous), but I'm not aware of anybody actively working on that at the moment.
Specifically, we could do
template <size_t _PaddingSize>
struct __padding {
char __padding_[_PaddingSize];
};
template <>
struct __padding<0> {};
and then just have
struct __short {
value_type __data_[__min_cap];
_LIBCPP_NO_UNIQUE_ADDRESS __padding __padding_;
unsigned char __size_ : 7;
unsigned char __is_long_ : 1;
};
IMO this is significantly more readable.
But it requires the newer AppleClang.
libcxx/include/string
Outdated
#else | ||
template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1> | ||
struct __short_impl { | ||
struct _LIBCPP_PACKED { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's intended. It's required to keep the ABI on AIX stable.
bd910ae
to
d5c8412
Compare
gentle ping: I've updated the patch according to @ldionne suggestion. I agree that @philnik777 solution is much cleaner, but requiring AppleClang seems to be a no-go (?) |
We've actually update the AppleClang now, so my solution should™ work now. |
libcxx/include/string
Outdated
@@ -748,6 +748,41 @@ struct __can_be_converted_to_string_view | |||
struct __uninitialized_size_tag {}; | |||
struct __init_with_sentinel_tag {}; | |||
|
|||
#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of the #ifdef
here. This type and __short_layout_classic
can both be defined regardless of which one we end up picking for the actual string representation (based on the single #ifdef
inside the basic_string
class).
libcxx/include/string
Outdated
template <class _CharT, size_t __min_cap, size_t _Padding = sizeof(_CharT) - 1> | ||
struct __short_layout_alternate { | ||
_CharT __data_[__min_cap]; | ||
unsigned char __padding_[_Padding]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try out @philnik777 's suggestion for the padding? I think it should work with our current CI setup.
It is a violation of the standard to use 0 length arrays, especially when not at the end of a structure (not a FAM GNU extension). Compiler generally accept it, but it's probably better to have a conforming implementation. This patch relies on `no_unique_address` being unconditionally available, which is the case since 42f5277.
8d4bc53
to
7cbb05f
Compare
7cbb05f
to
72ff46d
Compare
This is the root-cause for the LLDB failures that started occurring after llvm#105865. The DWARFASTParserClang has logic to try derive unnamed bitfields from DWARF offsets. In this case we treat `padding` as a 1-byte size field that would overlap with `flag`, and decide we need to introduce an unnamed bitfield into the AST, which is incorrect.
Here's the reduced issue: #108155 I'm working on a fix. But to get CI back in shape, are you ok with reverting this in the meantime @serge-sans-paille? Apologies for the disruption EDIT: ah looks like it was already reverted, I'll update this thread once the LLDB issue is fixed. The fallout from this was that we could no longer view |
…luator This would've caught the failures in #105865 in the libc++ data-formatter CI.
…#108155) This is the root-cause for the LLDB failures that started occurring after #105865. The DWARFASTParserClang has logic to try derive unnamed bitfields from DWARF offsets. In this case we treat `padding` as a 1-byte size field that would overlap with `flag`, and decide we need to introduce an unnamed bitfield into the AST, which is incorrect.
@mysterymath Can you please describe what this broke in the Fuchsia toolchain CI? Was it the data formatters? Edit: NVM, I saw your comment on the revert PR and I see that it was indeed the data formatters. |
@Michael137 IIUC we'd be fine to re-submit now, is that right? |
Not quite. I have a plan but havent put up a PR yet. Hopefully we're good by EOD tomorrow. I'll ping once I landed the fix |
…05865) It is a violation of the standard to use 0 length arrays, especially when not at the end of a structure (not a FAM GNU extension). Compiler generally accept it, but it's probably better to have a conforming implementation. --------- Co-authored-by: Louis Dionne <[email protected]>
llvm#108091) Reverts llvm#105865 This breaks a pair of LLDB tests in CI.
…luator This would've caught the failures in llvm#105865 in the libc++ data-formatter CI.
…llvm#108155) This is the root-cause for the LLDB failures that started occurring after llvm#105865. The DWARFASTParserClang has logic to try derive unnamed bitfields from DWARF offsets. In this case we treat `padding` as a 1-byte size field that would overlap with `flag`, and decide we need to introduce an unnamed bitfield into the AST, which is incorrect.
PR is up: #108343 |
…ng layout Depends on llvm#108362. Adds new layout for llvm#105865.
…ng layout Depends on llvm#108362. Adds new layout for llvm#105865.
…presence of overlapping fields (#108343) This bug surfaced after #105865 (currently reverted, but blocked on this to be relanded). Because Clang doesn't emit `DW_TAG_member`s for unnamed bitfields, LLDB has to make an educated guess about whether they existed in the source. It does so by checking whether there is a gap between where the last field ended and the currently parsed field starts. In the example test case, the empty field `padding` was folded into the storage of `data`. Because the `bit_offset` of `padding` is `0x0` and its `DW_AT_byte_size` is `0x1`, LLDB thinks the field ends at `0x1` (not quite because we first round the size to a word size, but this is an implementation detail), erroneously deducing that there's a gap between `flag` and `padding`. This patch adds the notion of "effective field end", which accounts for fields that share storage. It is set to the end of the storage that the two fields occupy. Then we use this to check for gaps in the unnamed bitfield creation logic.
…ng layout Depends on llvm#108362. Adds new layout for llvm#105865.
Just merged the LLDB patch. Feel free to reland (also added a test for this that should get run as part of the libc++ pre-merge CI) |
…ng layout Depends on llvm#108362. Adds new layout for llvm#105865.
It is a violation of the standard to use 0 length arrays, especially when not at the end of a structure (not a FAM GNU extension). Compiler generally accept it, but it's probably better to have a conforming implementation. This is a re-application of llvm#105865 which was reverted in 72cfc74 because it broke the data formatters. A LLDB patch has since been landed that should make this a non-issue.
It is a violation of the standard to use 0 length arrays, especially when not at the end of a structure (not a FAM GNU extension). Compiler generally accept it, but it's probably better to have a conforming implementation. This is a re-application of #105865 which was reverted in 72cfc74 because it broke the data formatters. A LLDB patch has since been landed that should make this a non-issue. Co-authored-by: serge-sans-paille <[email protected]>
…ng layout Depends on llvm#108362. Adds new layout for llvm#105865.
…ng layout (llvm#108375) Depends on llvm#108362 and llvm#108343. Adds new layout for llvm#105865.
It is a violation of the standard to use 0 length arrays, especially when not at the end of a structure (not a FAM GNU extension). Compiler generally accept it, but it's probably better to have a conforming implementation.