-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLD][COFF] Demangle ARM64EC export names. #87068
Conversation
@llvm/pr-subscribers-platform-windows @llvm/pr-subscribers-lld Author: Jacek Caban (cjacek) ChangesThis is compatible with MSVC. It matters when export is passed in mangled form (like -export:#func), it doesn't change handling of demangled form and explicit EXPORTAS cases. This form is currently used by clang, until #81940 lands. When undecorating, we need a way to check if the export is a function, so using the current form of Full diff: https://github.com/llvm/llvm-project/pull/87068.diff 2 Files Affected:
diff --git a/lld/COFF/DriverUtils.cpp b/lld/COFF/DriverUtils.cpp
index b4ff31a606da5e..d66e381d7b30a9 100644
--- a/lld/COFF/DriverUtils.cpp
+++ b/lld/COFF/DriverUtils.cpp
@@ -39,6 +39,7 @@
#include <optional>
using namespace llvm::COFF;
+using namespace llvm::object;
using namespace llvm::opt;
using namespace llvm;
using llvm::sys::Process;
@@ -632,18 +633,6 @@ Export LinkerDriver::parseExport(StringRef arg) {
fatal("invalid /export: " + arg);
}
-static StringRef undecorate(COFFLinkerContext &ctx, StringRef sym) {
- if (ctx.config.machine != I386)
- return sym;
- // In MSVC mode, a fully decorated stdcall function is exported
- // as-is with the leading underscore (with type IMPORT_NAME).
- // In MinGW mode, a decorated stdcall function gets the underscore
- // removed, just like normal cdecl functions.
- if (sym.starts_with("_") && sym.contains('@') && !ctx.config.mingw)
- return sym;
- return sym.starts_with("_") ? sym.substr(1) : sym;
-}
-
// Convert stdcall/fastcall style symbols into unsuffixed symbols,
// with or without a leading underscore. (MinGW specific.)
static StringRef killAt(StringRef sym, bool prefix) {
@@ -693,11 +682,29 @@ void LinkerDriver::fixupExports() {
for (Export &e : ctx.config.exports) {
if (!e.exportAs.empty()) {
e.exportName = e.exportAs;
- } else if (!e.forwardTo.empty()) {
- e.exportName = undecorate(ctx, e.name);
- } else {
- e.exportName = undecorate(ctx, e.extName.empty() ? e.name : e.extName);
+ continue;
+ }
+
+ StringRef sym =
+ !e.forwardTo.empty() || e.extName.empty() ? e.name : e.extName;
+ if (ctx.config.machine == I386 && sym.starts_with("_")) {
+ // In MSVC mode, a fully decorated stdcall function is exported
+ // as-is with the leading underscore (with type IMPORT_NAME).
+ // In MinGW mode, a decorated stdcall function gets the underscore
+ // removed, just like normal cdecl functions.
+ if (ctx.config.mingw || !sym.contains('@')) {
+ e.exportName = sym.substr(1);
+ continue;
+ }
+ }
+ if (isArm64EC(ctx.config.machine) && !e.data && !e.constant) {
+ if (std::optional<std::string> demangledName =
+ getArm64ECDemangledFunctionName(sym)) {
+ e.exportName = saver().save(*demangledName);
+ continue;
+ }
}
+ e.exportName = sym;
}
if (ctx.config.killAt && ctx.config.machine == I386) {
diff --git a/lld/test/COFF/arm64ec-exports.test b/lld/test/COFF/arm64ec-exports.test
new file mode 100644
index 00000000000000..d636d04ee06cca
--- /dev/null
+++ b/lld/test/COFF/arm64ec-exports.test
@@ -0,0 +1,127 @@
+REQUIRES: aarch64
+RUN: split-file %s %t.dir && cd %t.dir
+
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows func.s -o func.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-mangled.s -o data-mangled.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-demangled.s -o data-demangled.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows drectve1.s -o drectve1.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows drectve2.s -o drectve2.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows drectve3.s -o drectve3.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows %S/Inputs/loadconfig-arm64ec.s -o loadconfig-arm64ec.obj
+
+Check that the export function name is always demangled.
+
+RUN: lld-link -out:func.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec -export:func
+RUN: llvm-readobj --coff-exports func.dll | FileCheck %s
+RUN: llvm-readobj func.lib | FileCheck --check-prefix=IMPLIB %s
+
+RUN: lld-link -out:func2.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec "-export:#func,EXPORTAS,func"
+RUN: llvm-readobj --coff-exports func2.dll | FileCheck %s
+RUN: llvm-readobj func2.lib | FileCheck --check-prefix=IMPLIB %s
+
+RUN: lld-link -out:func3.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec "-export:#func"
+RUN: llvm-readobj --coff-exports func3.dll | FileCheck %s
+RUN: llvm-readobj func3.lib | FileCheck --check-prefix=IMPLIB %s
+
+RUN: lld-link -out:func4.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec drectve1.obj
+RUN: llvm-readobj --coff-exports func4.dll | FileCheck %s
+RUN: llvm-readobj func4.lib | FileCheck --check-prefix=IMPLIB %s
+
+RUN: lld-link -out:func5.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec drectve2.obj
+RUN: llvm-readobj --coff-exports func5.dll | FileCheck %s
+RUN: llvm-readobj func5.lib | FileCheck --check-prefix=IMPLIB %s
+
+RUN: lld-link -out:func6.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec drectve3.obj
+RUN: llvm-readobj --coff-exports func6.dll | FileCheck %s
+RUN: llvm-readobj func6.lib | FileCheck --check-prefix=IMPLIB %s
+
+CHECK: Name: func
+
+IMPLIB: File: func{{.*}}.lib(func{{.*}}.dll)
+IMPLIB-NEXT: Format: COFF-ARM64
+IMPLIB-NEXT: Arch: aarch64
+IMPLIB-NEXT: AddressSize: 64bit
+IMPLIB-EMPTY:
+IMPLIB-NEXT: File: func{{.*}}.lib(func{{.*}}.dll)
+IMPLIB-NEXT: Format: COFF-ARM64
+IMPLIB-NEXT: Arch: aarch64
+IMPLIB-NEXT: AddressSize: 64bit
+IMPLIB-EMPTY:
+IMPLIB-NEXT: File: func{{.*}}.lib(func{{.*}}.dll)
+IMPLIB-NEXT: Format: COFF-ARM64
+IMPLIB-NEXT: Arch: aarch64
+IMPLIB-NEXT: AddressSize: 64bit
+IMPLIB-EMPTY:
+IMPLIB-NEXT: File: func{{.*}}.dll
+IMPLIB-NEXT: Format: COFF-import-file-ARM64EC
+IMPLIB-NEXT: Type: code
+IMPLIB-NEXT: Name type: export as
+IMPLIB-NEXT: Export name: func
+IMPLIB-NEXT: Symbol: __imp_func
+IMPLIB-NEXT: Symbol: func
+IMPLIB-NEXT: Symbol: __imp_aux_func
+IMPLIB-NEXT: Symbol: #func
+
+
+Check data export name is not demangled.
+
+RUN: lld-link -out:data.dll data-demangled.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec -export:data_sym,DATA
+RUN: llvm-readobj --coff-exports data.dll | FileCheck --check-prefix=DATA %s
+RUN: llvm-readobj data.lib | FileCheck --check-prefix=DATA-IMPLIB %s
+
+DATA: Name: data_sym
+
+DATA-IMPLIB: Format: COFF-import-file-ARM64EC
+DATA-IMPLIB-NEXT: Type: data
+DATA-IMPLIB-NEXT: Name type: name
+DATA-IMPLIB-NEXT: Export name: data_sym
+DATA-IMPLIB-NEXT: Symbol: __imp_data_sym
+
+RUN: lld-link -out:data2.dll data-mangled.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec "-export:#data_sym,DATA"
+RUN: llvm-readobj --coff-exports data2.dll | FileCheck --check-prefix=DATA2 %s
+RUN: llvm-readobj data2.lib | FileCheck --check-prefix=DATA2-IMPLIB %s
+
+DATA2: Name: #data_sym
+
+DATA2-IMPLIB: Format: COFF-import-file-ARM64EC
+DATA2-IMPLIB-NEXT: Type: data
+DATA2-IMPLIB-NEXT: Name type: name
+DATA2-IMPLIB-NEXT: Export name: #data_sym
+DATA2-IMPLIB-NEXT: Symbol: __imp_data_sym
+
+#--- func.s
+ .weak_anti_dep func
+ func = "#func"
+
+ .text
+ .globl "#func"
+ .p2align 2, 0x0
+"#func":
+ mov w0, #2
+ ret
+
+#--- data-mangled.s
+ .data
+ .globl "#data_sym"
+ .p2align 2, 0x0
+"#data_sym":
+ .word 0x01010101
+
+#--- data-demangled.s
+ .data
+ .globl data_sym
+ .p2align 2, 0x0
+data_sym:
+ .word 0x01010101
+
+#--- drectve1.s
+ .section .drectve, "yn"
+ .ascii " -export:func"
+
+#--- drectve2.s
+ .section .drectve, "yn"
+ .ascii " -export:#func"
+
+#--- drectve3.s
+ .section .drectve, "yn"
+ .ascii " -export:#func,EXPORTAS,func"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure MSVC actually demangles the name, as opposed to getting the demangled name from somewhere else?
c47a4ef
to
bacc2f5
Compare
From my experiments with MSVC, I concluded that it does demangle the export name. Since it's all undocumented, I conducted numerous experiments with MSVC, testing various inputs in both typical situations and edge cases to infer explanations for the observed behavior. Below are some tests relevant to this PR. I used inputs that are simple enough to rule out dependency on other factors:
The basic test:
creates a DLL with an unmangled export name pointing to the mangled symbol. This suggests that the demangling of the export name is not related to weak anti-dependency aliases or similar mechanisms since it works without them. Another question is whether we should demangle the symbol as well, or just the export name. A similar test defining only an unmangled symbol fails (using x64-func.o gives the same result):
With an error:
This indicates that the linker looks for the exact symbol name, not its demangled form. Testing further, using the unmangled export and unmangled symbol works fine:
The more tricky case is using an unmangled export name and a mangled symbol, which works (unlike the other way around):
This raises the question of how the linker knows about the mangled symbol in this case. The next experiment defines both mangled and unmangled symbol definitions to see what the linker does:
This command results in an error (using x64-func.o instead of unmangled-func.o produces the same result):
(The symbol names in the error message are broken, likely due to a UTF-8/UTF-16 mismatch in link.exe.) This shows that the linker has some more complicated EC mangling awareness. I tried adding entry and exit thunks, but I couldn't find a way to make the linker accept this. This is crucial for other aspects of my work, but in the context of this PR, it shows that the linker understands the relationship between mangled and demangled symbols for exported symbols. This mangling handling seems specific to export handling; if I skip the export directive:
it builds fine. These symbols can also be resolved from object files:
This matches my other experiments for different features. From my observations, the linker has mangling awareness in specific situations, but it's not something that unconditionally applies to all symbols. Other examples of special handling include:
In my WIP tree (https://github.com/cjacek/llvm-project/commits/arm64ec), I implemented these features using a mechanism called "EC aliases," where I create paired symbols with different semantics (allowing any other definition to override the alias symbol; overriding one symbol unmarks the paired symbol as no longer being an EC alias). This code is not yet fully compatible or clean, and I plan to refine it further, conduct more testing, and likely rewrite it before submitting those parts for review. Currently, it's sufficient to get things working, including linking against MSVC default libs. I also updated it to cover all the experiments described here. Returning to the context of exports, there are a few more interesting tests. If I try to reference an unmangled symbol when only the mangled version is available, it fails (while this worked using the
results in:
However, if I add the -export directive, not only does the export work, but the unresolved symbol is resolved too:
This behavior can be explained by the creation of "EC aliases" for exported symbols. One variant that still doesn't work is:
Since the unmangled symbol is defined, the "EC alias" is not created and referencing its mangled form still fails. Another similar corner case: if an unmangled symbol is referenced from x64 code, it may reference the mangled symbol even without an explicit alias (e.g., no
This PR touches only on export names, not "EC aliases" or similar mechanisms; I mentioned them for better context. The changed part of the code doesn't require additional modifications in my prototype, which otherwise matches the behavior in all the experiments mentioned here (except it doesn't issue an error when both mangled and unmangled symbols are defined). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
So it does do demangling, but only in specific places, as necessary to support specific features. That seems to match my experience with weird linker errors...
likely due to a UTF-8/UTF-16 mismatch in link.exe
If you reinterpret UTF-16 昣湵c
as UTF-8, it is indeed #func.
This is compatible with MSVC. It matters when export is passed in mangled form (like -export:#func), it doesn't change handling of demangled form and explicit EXPORTAS cases. This form is currently used by clang, until #81940 lands.
When undecorating, we need a way to check if the export is a function, so using the current form of
undecorate()
isn't enough. With a bit tweaked logic, it felt more appropriate to just inline that helper instead of extending it.