Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLD][COFF] Demangle ARM64EC export names. #87068

Merged
merged 1 commit into from
May 21, 2024
Merged

Conversation

cjacek
Copy link
Contributor

@cjacek cjacek commented Mar 29, 2024

This is compatible with MSVC. It matters when export is passed in mangled form (like -export:#func), it doesn't change handling of demangled form and explicit EXPORTAS cases. This form is currently used by clang, until #81940 lands.

When undecorating, we need a way to check if the export is a function, so using the current form of undecorate() isn't enough. With a bit tweaked logic, it felt more appropriate to just inline that helper instead of extending it.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 29, 2024

@llvm/pr-subscribers-platform-windows
@llvm/pr-subscribers-lld-coff

@llvm/pr-subscribers-lld

Author: Jacek Caban (cjacek)

Changes

This is compatible with MSVC. It matters when export is passed in mangled form (like -export:#func), it doesn't change handling of demangled form and explicit EXPORTAS cases. This form is currently used by clang, until #81940 lands.

When undecorating, we need a way to check if the export is a function, so using the current form of undecorate() isn't enough. With a bit tweaked logic, it felt more appropriate to just inline that helper instead of extending it.


Full diff: https://github.com/llvm/llvm-project/pull/87068.diff

2 Files Affected:

  • (modified) lld/COFF/DriverUtils.cpp (+23-16)
  • (added) lld/test/COFF/arm64ec-exports.test (+127)
diff --git a/lld/COFF/DriverUtils.cpp b/lld/COFF/DriverUtils.cpp
index b4ff31a606da5e..d66e381d7b30a9 100644
--- a/lld/COFF/DriverUtils.cpp
+++ b/lld/COFF/DriverUtils.cpp
@@ -39,6 +39,7 @@
 #include <optional>
 
 using namespace llvm::COFF;
+using namespace llvm::object;
 using namespace llvm::opt;
 using namespace llvm;
 using llvm::sys::Process;
@@ -632,18 +633,6 @@ Export LinkerDriver::parseExport(StringRef arg) {
   fatal("invalid /export: " + arg);
 }
 
-static StringRef undecorate(COFFLinkerContext &ctx, StringRef sym) {
-  if (ctx.config.machine != I386)
-    return sym;
-  // In MSVC mode, a fully decorated stdcall function is exported
-  // as-is with the leading underscore (with type IMPORT_NAME).
-  // In MinGW mode, a decorated stdcall function gets the underscore
-  // removed, just like normal cdecl functions.
-  if (sym.starts_with("_") && sym.contains('@') && !ctx.config.mingw)
-    return sym;
-  return sym.starts_with("_") ? sym.substr(1) : sym;
-}
-
 // Convert stdcall/fastcall style symbols into unsuffixed symbols,
 // with or without a leading underscore. (MinGW specific.)
 static StringRef killAt(StringRef sym, bool prefix) {
@@ -693,11 +682,29 @@ void LinkerDriver::fixupExports() {
   for (Export &e : ctx.config.exports) {
     if (!e.exportAs.empty()) {
       e.exportName = e.exportAs;
-    } else if (!e.forwardTo.empty()) {
-      e.exportName = undecorate(ctx, e.name);
-    } else {
-      e.exportName = undecorate(ctx, e.extName.empty() ? e.name : e.extName);
+      continue;
+    }
+
+    StringRef sym =
+        !e.forwardTo.empty() || e.extName.empty() ? e.name : e.extName;
+    if (ctx.config.machine == I386 && sym.starts_with("_")) {
+      // In MSVC mode, a fully decorated stdcall function is exported
+      // as-is with the leading underscore (with type IMPORT_NAME).
+      // In MinGW mode, a decorated stdcall function gets the underscore
+      // removed, just like normal cdecl functions.
+      if (ctx.config.mingw || !sym.contains('@')) {
+        e.exportName = sym.substr(1);
+        continue;
+      }
+    }
+    if (isArm64EC(ctx.config.machine) && !e.data && !e.constant) {
+      if (std::optional<std::string> demangledName =
+              getArm64ECDemangledFunctionName(sym)) {
+        e.exportName = saver().save(*demangledName);
+        continue;
+      }
     }
+    e.exportName = sym;
   }
 
   if (ctx.config.killAt && ctx.config.machine == I386) {
diff --git a/lld/test/COFF/arm64ec-exports.test b/lld/test/COFF/arm64ec-exports.test
new file mode 100644
index 00000000000000..d636d04ee06cca
--- /dev/null
+++ b/lld/test/COFF/arm64ec-exports.test
@@ -0,0 +1,127 @@
+REQUIRES: aarch64
+RUN: split-file %s %t.dir && cd %t.dir
+
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows func.s -o func.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-mangled.s -o data-mangled.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-demangled.s -o data-demangled.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows drectve1.s -o drectve1.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows drectve2.s -o drectve2.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows drectve3.s -o drectve3.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows %S/Inputs/loadconfig-arm64ec.s -o loadconfig-arm64ec.obj
+
+Check that the export function name is always demangled.
+
+RUN: lld-link -out:func.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec -export:func
+RUN: llvm-readobj --coff-exports func.dll | FileCheck %s
+RUN: llvm-readobj func.lib | FileCheck --check-prefix=IMPLIB %s
+
+RUN: lld-link -out:func2.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec "-export:#func,EXPORTAS,func"
+RUN: llvm-readobj --coff-exports func2.dll | FileCheck %s
+RUN: llvm-readobj func2.lib | FileCheck --check-prefix=IMPLIB %s
+
+RUN: lld-link -out:func3.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec "-export:#func"
+RUN: llvm-readobj --coff-exports func3.dll | FileCheck %s
+RUN: llvm-readobj func3.lib | FileCheck --check-prefix=IMPLIB %s
+
+RUN: lld-link -out:func4.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec drectve1.obj
+RUN: llvm-readobj --coff-exports func4.dll | FileCheck %s
+RUN: llvm-readobj func4.lib | FileCheck --check-prefix=IMPLIB %s
+
+RUN: lld-link -out:func5.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec drectve2.obj
+RUN: llvm-readobj --coff-exports func5.dll | FileCheck %s
+RUN: llvm-readobj func5.lib | FileCheck --check-prefix=IMPLIB %s
+
+RUN: lld-link -out:func6.dll func.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec drectve3.obj
+RUN: llvm-readobj --coff-exports func6.dll | FileCheck %s
+RUN: llvm-readobj func6.lib | FileCheck --check-prefix=IMPLIB %s
+
+CHECK: Name: func
+
+IMPLIB:      File: func{{.*}}.lib(func{{.*}}.dll)
+IMPLIB-NEXT: Format: COFF-ARM64
+IMPLIB-NEXT: Arch: aarch64
+IMPLIB-NEXT: AddressSize: 64bit
+IMPLIB-EMPTY:
+IMPLIB-NEXT: File: func{{.*}}.lib(func{{.*}}.dll)
+IMPLIB-NEXT: Format: COFF-ARM64
+IMPLIB-NEXT: Arch: aarch64
+IMPLIB-NEXT: AddressSize: 64bit
+IMPLIB-EMPTY:
+IMPLIB-NEXT: File: func{{.*}}.lib(func{{.*}}.dll)
+IMPLIB-NEXT: Format: COFF-ARM64
+IMPLIB-NEXT: Arch: aarch64
+IMPLIB-NEXT: AddressSize: 64bit
+IMPLIB-EMPTY:
+IMPLIB-NEXT: File: func{{.*}}.dll
+IMPLIB-NEXT: Format: COFF-import-file-ARM64EC
+IMPLIB-NEXT: Type: code
+IMPLIB-NEXT: Name type: export as
+IMPLIB-NEXT: Export name: func
+IMPLIB-NEXT: Symbol: __imp_func
+IMPLIB-NEXT: Symbol: func
+IMPLIB-NEXT: Symbol: __imp_aux_func
+IMPLIB-NEXT: Symbol: #func
+
+
+Check data export name is not demangled.
+
+RUN: lld-link -out:data.dll data-demangled.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec -export:data_sym,DATA
+RUN: llvm-readobj --coff-exports data.dll | FileCheck --check-prefix=DATA %s
+RUN: llvm-readobj data.lib | FileCheck --check-prefix=DATA-IMPLIB %s
+
+DATA: Name: data_sym
+
+DATA-IMPLIB:      Format: COFF-import-file-ARM64EC
+DATA-IMPLIB-NEXT: Type: data
+DATA-IMPLIB-NEXT: Name type: name
+DATA-IMPLIB-NEXT: Export name: data_sym
+DATA-IMPLIB-NEXT: Symbol: __imp_data_sym
+
+RUN: lld-link -out:data2.dll data-mangled.obj loadconfig-arm64ec.obj -dll -noentry -machine:arm64ec "-export:#data_sym,DATA"
+RUN: llvm-readobj --coff-exports data2.dll | FileCheck --check-prefix=DATA2 %s
+RUN: llvm-readobj data2.lib | FileCheck --check-prefix=DATA2-IMPLIB %s
+
+DATA2: Name: #data_sym
+
+DATA2-IMPLIB:      Format: COFF-import-file-ARM64EC
+DATA2-IMPLIB-NEXT: Type: data
+DATA2-IMPLIB-NEXT: Name type: name
+DATA2-IMPLIB-NEXT: Export name: #data_sym
+DATA2-IMPLIB-NEXT: Symbol: __imp_data_sym
+
+#--- func.s
+    .weak_anti_dep func
+    func = "#func"
+
+    .text
+    .globl "#func"
+    .p2align 2, 0x0
+"#func":
+    mov w0, #2
+    ret
+
+#--- data-mangled.s
+    .data
+    .globl "#data_sym"
+    .p2align 2, 0x0
+"#data_sym":
+    .word 0x01010101
+
+#--- data-demangled.s
+    .data
+    .globl data_sym
+    .p2align 2, 0x0
+data_sym:
+    .word 0x01010101
+
+#--- drectve1.s
+    .section .drectve, "yn"
+    .ascii " -export:func"
+
+#--- drectve2.s
+    .section .drectve, "yn"
+    .ascii " -export:#func"
+
+#--- drectve3.s
+    .section .drectve, "yn"
+    .ascii " -export:#func,EXPORTAS,func"

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure MSVC actually demangles the name, as opposed to getting the demangled name from somewhere else?

lld/test/COFF/arm64ec-exports.test Outdated Show resolved Hide resolved
@cjacek
Copy link
Contributor Author

cjacek commented May 20, 2024

Are you sure MSVC actually demangles the name, as opposed to getting the demangled name from somewhere else?

From my experiments with MSVC, I concluded that it does demangle the export name. Since it's all undocumented, I conducted numerous experiments with MSVC, testing various inputs in both typical situations and edge cases to infer explanations for the observed behavior.

Below are some tests relevant to this PR. I used inputs that are simple enough to rule out dependency on other factors:

$ cat unmangled-func.s
        .text
        .globl func
        .p2align 2
func:
        mov w0, #2
        ret
$ llvm-mc -filetype=obj -triple=arm64ec-windows unmangled-func.s -o unmangled-func.o
$ cat mangled-func.s
        .text
        .globl  "#func"
        .p2align 2
"#func":
        mov x0, #1
        ret
$ llvm-mc -filetype=obj -triple=arm64ec-windows mangled-func.s -o mangled-func.o
$ cat x64-func.s
        .text
        .globl func
        .p2align 2
func:
        movq $3, %rax
        retq
$ llvm-mc -filetype=obj -triple=x86_64-windows x64-func.s -o x64-func.o
$ cat unmangled-rva.s
        .section ".test","dr"
        .rva func
$ llvm-mc -filetype=obj -triple=arm64ec-windows unmangled-rva.s -o unmangled-rva.o
$ llvm-mc -filetype=obj -triple=x86_64-windows unmangled-rva.s -o unmangled-rva-x64.o
$ cat mangled-rva.s
        .section ".test","dr"
        .rva "#func"
$ llvm-mc -filetype=obj -triple=arm64ec-windows mangled-rva.s -o mangled-rva.o

The basic test:

$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o "-export:#func"

creates a DLL with an unmangled export name pointing to the mangled symbol. This suggests that the demangling of the export name is not related to weak anti-dependency aliases or similar mechanisms since it works without them. Another question is whether we should demangle the symbol as well, or just the export name. A similar test defining only an unmangled symbol fails (using x64-func.o gives the same result):

$ link -nologo -dll -noentry -machine:arm64ec unmangled-func.o "-export:#func"

With an error:

LINK : error LNK2001: unresolved external symbol #func (EC Symbol)

This indicates that the linker looks for the exact symbol name, not its demangled form. Testing further, using the unmangled export and unmangled symbol works fine:

$ link -nologo -dll -noentry -machine:arm64ec unmangled-func.o -export:func

The more tricky case is using an unmangled export name and a mangled symbol, which works (unlike the other way around):

$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o -export:func

This raises the question of how the linker knows about the mangled symbol in this case. The next experiment defines both mangled and unmangled symbol definitions to see what the linker does:

$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-func.o -export:func

This command results in an error (using x64-func.o instead of unmangled-func.o produces the same result):

LINK : fatal error LNK1413: ARM64EC symbol '昣湵c' is defined but has no ientry thunk and x64 symbol '畦据' is also defined but doesn't have an exit thunk. There must be either an ientry thunk or an exit thunk for one of these symbols.

(The symbol names in the error message are broken, likely due to a UTF-8/UTF-16 mismatch in link.exe.)

This shows that the linker has some more complicated EC mangling awareness. I tried adding entry and exit thunks, but I couldn't find a way to make the linker accept this. This is crucial for other aspects of my work, but in the context of this PR, it shows that the linker understands the relationship between mangled and demangled symbols for exported symbols. This mangling handling seems specific to export handling; if I skip the export directive:

$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-func.o

it builds fine. These symbols can also be resolved from object files:

$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-func.o mangled-rva.o unmangled-rva.o

This matches my other experiments for different features. From my observations, the linker has mangling awareness in specific situations, but it's not something that unconditionally applies to all symbols. Other examples of special handling include:

  • Special-casing unmangled->mangled weak anti-dependency symbols
  • Allowing references to unmangled names from static libraries that have only mangled variants in their ECSYMBOLS section
  • Entry point symbols
  • Allowing x64 code to reference symbols defined only in the mangled form

In my WIP tree (https://github.com/cjacek/llvm-project/commits/arm64ec), I implemented these features using a mechanism called "EC aliases," where I create paired symbols with different semantics (allowing any other definition to override the alias symbol; overriding one symbol unmarks the paired symbol as no longer being an EC alias). This code is not yet fully compatible or clean, and I plan to refine it further, conduct more testing, and likely rewrite it before submitting those parts for review. Currently, it's sufficient to get things working, including linking against MSVC default libs. I also updated it to cover all the experiments described here.

Returning to the context of exports, there are a few more interesting tests. If I try to reference an unmangled symbol when only the mangled version is available, it fails (while this worked using the -export directive in the example above):

$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-rva.o

results in:

unmangled-rva.o : error LNK2001: unresolved external symbol func (EC Symbol)

However, if I add the -export directive, not only does the export work, but the unresolved symbol is resolved too:

$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-rva.o -export:func
$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-rva.o "-export:#func"
$ link -nologo -dll -noentry -machine:arm64ec unmangled-func.o mangled-rva.o -export:func

This behavior can be explained by the creation of "EC aliases" for exported symbols. One variant that still doesn't work is:

$ link -nologo -dll -noentry -machine:arm64ec unmangled-func.o mangled-rva.o "-export:#func"

Since the unmangled symbol is defined, the "EC alias" is not created and referencing its mangled form still fails.

Another similar corner case: if an unmangled symbol is referenced from x64 code, it may reference the mangled symbol even without an explicit alias (e.g., no -export directive, no explicit weak alias). For example, the following command links fine with MSVC:

$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-rva-x64.o

This PR touches only on export names, not "EC aliases" or similar mechanisms; I mentioned them for better context. The changed part of the code doesn't require additional modifications in my prototype, which otherwise matches the behavior in all the experiments mentioned here (except it doesn't issue an error when both mangled and unmangled symbols are defined).

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

So it does do demangling, but only in specific places, as necessary to support specific features. That seems to match my experience with weird linker errors...

likely due to a UTF-8/UTF-16 mismatch in link.exe

If you reinterpret UTF-16 昣湵c as UTF-8, it is indeed #func.

@cjacek cjacek merged commit 5693678 into llvm:main May 21, 2024
4 checks passed
@cjacek cjacek deleted the lld-mangled-export branch May 21, 2024 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants