-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLD] [COFF] Zero-intialization & proper constructor invocation in COFF's Symbol #98447
[LLD] [COFF] Zero-intialization & proper constructor invocation in COFF's Symbol #98447
Conversation
@llvm/pr-subscribers-platform-windows @llvm/pr-subscribers-lld-coff Author: Vikash Gupta (vg0204) ChangesIt happened due to lld's COFF linker multiple regression tests failure. The bug found can be identified as a Heisenbug, which appears only in Release mode, not in Debug mode. It also vanishes when BUG is being tried to be outputed out or isolated. This behaviour cannot be seen at all in OS below Ubuntu24. Eventually, the bug was about not handling exported symbols properly, when dealing with defining undefined symbols for COFF linker. Its specifically about improper forwarding of isUsedinRegObject boolean while invoking replaceSymbol() API, so hence explicitly doing it resolved issue. Full diff: https://github.com/llvm/llvm-project/pull/98447.diff 1 Files Affected:
diff --git a/lld/COFF/Symbols.h b/lld/COFF/Symbols.h
index 5ef46f5af6a6c..d33989841c501 100644
--- a/lld/COFF/Symbols.h
+++ b/lld/COFF/Symbols.h
@@ -499,8 +499,10 @@ void replaceSymbol(Symbol *s, ArgT &&... arg) {
assert(static_cast<Symbol *>(static_cast<T *>(nullptr)) == nullptr &&
"Not a Symbol");
bool canInline = s->canInline;
+ bool isUsedInRegularObj = s->isUsedInRegularObj;
new (s) T(std::forward<ArgT>(arg)...);
s->canInline = canInline;
+ s->isUsedInRegularObj = isUsedInRegularObj;
}
} // namespace coff
|
Needs test? |
I don't know exactly, as this patch solved some failing test cases(bug) when ran in Ubuntu 24 (for the first time in psdb pipeline), but it was not needed up until ubuntu 22. |
FWIW, I can't reproduce such an issue with Ubuntu 24.04. I tested setting up Ubuntu 24.04, installed the |
Request changes unless there is a reproduce and evidence that I am more familiar with the ELF port where there used to be some issues related to If you believe there is an issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Point noted! |
As I looked into it again, what are you saying is true, but I am getting this issue while trying to build in a private repo, which gets updated with main upstream on regular basis. Seeing how things procced there, it would better to decide if this patch is really needed or not. |
It would be easier to evaluate with a test. Any behavior change needs a test |
Also. I have a query regarding the replaceSymbol() function used for other linker like wasm or MachO. All of them used the reference forwarding, then why they need explicit re-assignment of boolean variables. If so, shouldn't it be uniformly done for all such variables in every version of replaceSymbol(). If anyone could comment on this. |
I think there's a real bug here. It seems extremely sketchy that we don't initialize If we zero initialize the member in all builds, does that cause tests to fail, and then does this patch fix it? I think that's the next thing to try here. |
Will do that right away! |
Before this change, SymbolTable::insert set this boolean to a meaningful value before calling the appropriate Symbol constructor via replaceSymbol. I believe this is UB, it is probably valid for a compiler to zero out a bitfield in the constructor, rather than carefully preserving the single bit that existed prevoiusly. After this change, the constructor does the obvious thing, which is to zero initialize this field, and we explicitly copy the bitfield around the constructor call in replaceSymbol. This should be an alternative solution to the problems described in llvm#98447 .
As mentioned in your commit, it really solves the problem. The regression failure can be reliably observed when isUsedDefinedInRegObj bit is zero-intialized in all builds (Ubuntu24 & below) and also identical in nature. Now, if this patch is applied on top of the zero intialization, the issue gets resolved for all builds. Exactly both of this is being done in your mentioned commit, thus exposing & solving the problem uniformly across different Ubuntu builds. |
PING! |
I also reproduced, that if we as zeroing of this variable in the constructor, the issue appears, and this change fixes it. But I would prefer if we’d add such zeroing in the constructor, to get consistent and reliable behavior across all toolchains. I’m not sure that it’s deliberately omitted, it seems like an oversight to me. |
Yeah, I put together that patch locally, but I wanted to give @vg0204 the opportunity to add the zero initialization so they get credit for the fix. :) So, yes, that is the next step, add |
Appreciate that @rnk, already made the necessary changes locally as well while testing out your suggestion on my toolchain of ubuntu24 where this issues had been intially observed. So, would push it up the changes soon, just after testing all regression suite one! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
The PR description/subject should probably be updated a bit before landing this though, now that the issue is better understood. (Although whoever merges it probably also can touch up the commit message.)
Will do the needful, before the merge! |
…ymbol class It happened due to lld's COFF linker multiple regression tests failure. It got reliably reproduced after the needed intialization of isUsedinRegularObject bit in the Symbol's ctor, but not handled at replaceSymbol API properly while creating a specific symbol to insert in symbol table. Eventually, the issue was about not handling exported symbols properly, when dealing with defining undefined symbols for COFF linker. It got resolved by explicit setting up the isUsedinRegularObject bit around the ctor invocation of symbol class in replaceSymbol API.
0a3e465
to
205dbe7
Compare
…FF's Symbol (llvm#98447) It happened due to lld's COFF linker multiple regression tests failure. It got reliably reproduced after the needed intialization of isUsedinRegularObject bit in the Symbol's ctor, but not handled at replaceSymbol API properly while creating a specific symbol to insert in symbol table. So, now while creating the specific symbol using replaceSymbol, by explicitly setting the value of isUsedinRegularObject to newly created symbol around the ctor call of symbol would solve the regression failure
…FF's Symbol (#98447) It happened due to lld's COFF linker multiple regression tests failure. It got reliably reproduced after the needed intialization of isUsedinRegularObject bit in the Symbol's ctor, but not handled at replaceSymbol API properly while creating a specific symbol to insert in symbol table. So, now while creating the specific symbol using replaceSymbol, by explicitly setting the value of isUsedinRegularObject to newly created symbol around the ctor call of symbol would solve the regression failure
It happened due to lld's COFF linker multiple regression tests failure. It got reliably reproduced after the needed intialization of isUsedinRegularObject bit in the Symbol's ctor, but not handled at replaceSymbol API properly while creating a specific symbol to insert in symbol table.
So, now while creating the specific symbol using replaceSymbol, by explicitly setting the value of isUsedinRegularObject to newly created symbol around the ctor call of symbol would solve the regression failure