Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply " [XRay] Add support for instrumentation of DSOs on x86_64 (#90959)" #112930

Closed
wants to merge 391 commits into from

Conversation

sebastiankreutzer
Copy link
Contributor

This fixes remaining issues in my previous PR #90959.

Changes:

  • Removed dependency on LLVM header in xray_interface.cpp
  • Fixed XRay patching for some targets due to missing changes in architecture-specific patching functions
  • Addressed some remaining compiler warnings that I missed in the previous patch
  • Formatting

I have tested these changes on x86_64 (natively), as well as ppc64le, aarch64 and arm32 (cross-compiled and emulated using qemu).

Original description:

This PR introduces shared library (DSO) support for XRay based on a revised version of the implementation outlined in this RFC.
The feature enables the patching and handling of events from DSOs, supporting both libraries linked at startup or explicitly loaded, e.g. via dlopen.
This patch adds the following:

  • The -fxray-shared flag to enable the feature (turned off by default)
  • A small runtime library that is linked into every instrumented DSO, providing position-independent trampolines and code to register with the main XRay runtime
  • Changes to the XRay runtime to support management and patching of multiple objects

These changes are fully backward compatible, i.e. running without instrumented DSOs will produce identical traces (in terms of recorded function IDs) to the previous implementation.

Due to my limited ability to test on other architectures, this feature is only implemented and tested with x86_64. Extending support to other architectures is fairly straightforward, requiring only a position-independent implementation of the architecture-specific trampoline implementation (see compiler-rt/lib/xray/xray_trampoline_x86_64.S for reference).

This patch does not include any functionality to resolve function IDs from DSOs for the provided logging/tracing modes. These modes still work and will record calls from DSOs, but symbol resolution for these functions in not available. Getting this to work properly requires recording information about the loaded DSOs and should IMO be discussed in a separate RFC, as there are mulitple feasible approaches.

@llvmbot llvmbot added clang Clang issues not falling into any other category compiler-rt clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" xray labels Oct 18, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 18, 2024

@llvm/pr-subscribers-xray
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: Sebastian Kreutzer (sebastiankreutzer)

Changes

This fixes remaining issues in my previous PR #90959.

Changes:

  • Removed dependency on LLVM header in xray_interface.cpp
  • Fixed XRay patching for some targets due to missing changes in architecture-specific patching functions
  • Addressed some remaining compiler warnings that I missed in the previous patch
  • Formatting

I have tested these changes on x86_64 (natively), as well as ppc64le, aarch64 and arm32 (cross-compiled and emulated using qemu).

Original description:

This PR introduces shared library (DSO) support for XRay based on a revised version of the implementation outlined in this RFC.
The feature enables the patching and handling of events from DSOs, supporting both libraries linked at startup or explicitly loaded, e.g. via dlopen.
This patch adds the following:

  • The -fxray-shared flag to enable the feature (turned off by default)
  • A small runtime library that is linked into every instrumented DSO, providing position-independent trampolines and code to register with the main XRay runtime
  • Changes to the XRay runtime to support management and patching of multiple objects

These changes are fully backward compatible, i.e. running without instrumented DSOs will produce identical traces (in terms of recorded function IDs) to the previous implementation.

Due to my limited ability to test on other architectures, this feature is only implemented and tested with x86_64. Extending support to other architectures is fairly straightforward, requiring only a position-independent implementation of the architecture-specific trampoline implementation (see compiler-rt/lib/xray/xray_trampoline_x86_64.S for reference).

This patch does not include any functionality to resolve function IDs from DSOs for the provided logging/tracing modes. These modes still work and will record calls from DSOs, but symbol resolution for these functions in not available. Getting this to work properly requires recording information about the loaded DSOs and should IMO be discussed in a separate RFC, as there are mulitple feasible approaches.


Patch is 87.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112930.diff

29 Files Affected:

  • (modified) clang/include/clang/Basic/CodeGenOptions.def (+2)
  • (modified) clang/include/clang/Driver/Options.td (+5)
  • (modified) clang/include/clang/Driver/XRayArgs.h (+2)
  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+8-4)
  • (modified) clang/lib/Driver/XRayArgs.cpp (+21)
  • (added) clang/test/Driver/XRay/xray-shared.cpp (+17)
  • (modified) compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake (+1)
  • (modified) compiler-rt/cmake/config-ix.cmake (+4)
  • (modified) compiler-rt/include/xray/xray_interface.h (+56-9)
  • (modified) compiler-rt/lib/xray/CMakeLists.txt (+76-10)
  • (modified) compiler-rt/lib/xray/xray_AArch64.cpp (+12-7)
  • (modified) compiler-rt/lib/xray/xray_arm.cpp (+12-7)
  • (added) compiler-rt/lib/xray/xray_dso_init.cpp (+62)
  • (modified) compiler-rt/lib/xray/xray_hexagon.cpp (+12-7)
  • (modified) compiler-rt/lib/xray/xray_init.cpp (+157-26)
  • (modified) compiler-rt/lib/xray/xray_interface.cpp (+216-64)
  • (modified) compiler-rt/lib/xray/xray_interface_internal.h (+62-14)
  • (modified) compiler-rt/lib/xray/xray_loongarch64.cpp (+16-7)
  • (modified) compiler-rt/lib/xray/xray_mips.cpp (+16-7)
  • (modified) compiler-rt/lib/xray/xray_mips64.cpp (+16-7)
  • (modified) compiler-rt/lib/xray/xray_powerpc64.cpp (+23-5)
  • (modified) compiler-rt/lib/xray/xray_trampoline_x86_64.S (+17-7)
  • (modified) compiler-rt/lib/xray/xray_x86_64.cpp (+18-12)
  • (added) compiler-rt/test/xray/TestCases/Posix/basic-mode-dso.cpp (+47)
  • (added) compiler-rt/test/xray/TestCases/Posix/clang-xray-shared.cpp (+14)
  • (added) compiler-rt/test/xray/TestCases/Posix/dlopen.cpp (+107)
  • (added) compiler-rt/test/xray/TestCases/Posix/dso-dep-chains.cpp (+197)
  • (added) compiler-rt/test/xray/TestCases/Posix/patch-premain-dso.cpp (+45)
  • (added) compiler-rt/test/xray/TestCases/Posix/patching-unpatching-dso.cpp (+75)
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index eac831278ee20d..e45370bde74a5d 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -136,6 +136,8 @@ CODEGENOPT(XRayIgnoreLoops , 1, 0)
 ///< Emit the XRay function index section.
 CODEGENOPT(XRayFunctionIndex , 1, 1)
 
+///< Set when -fxray-shared is enabled
+CODEGENOPT(XRayShared , 1, 0)
 
 ///< Set the minimum number of instructions in a function to determine selective
 ///< XRay instrumentation.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 152c43d7908ff8..6748f7045566aa 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2948,6 +2948,11 @@ def fxray_selected_function_group :
   HelpText<"When using -fxray-function-groups, select which group of functions to instrument. Valid range is 0 to fxray-function-groups - 1">,
   MarshallingInfoInt<CodeGenOpts<"XRaySelectedFunctionGroup">, "0">;
 
+defm xray_shared : BoolFOption<"xray-shared",
+  CodeGenOpts<"XRayShared">, DefaultFalse,
+  PosFlag<SetTrue, [], [ClangOption, CC1Option],
+          "Enable shared library instrumentation with XRay">,
+  NegFlag<SetFalse>>;
 
 defm fine_grained_bitfield_accesses : BoolOption<"f", "fine-grained-bitfield-accesses",
   CodeGenOpts<"FineGrainedBitfieldAccesses">, DefaultFalse,
diff --git a/clang/include/clang/Driver/XRayArgs.h b/clang/include/clang/Driver/XRayArgs.h
index bdd3d979547eed..1b5c4a4c42f12a 100644
--- a/clang/include/clang/Driver/XRayArgs.h
+++ b/clang/include/clang/Driver/XRayArgs.h
@@ -27,6 +27,7 @@ class XRayArgs {
   XRayInstrSet InstrumentationBundle;
   llvm::opt::Arg *XRayInstrument = nullptr;
   bool XRayRT = true;
+  bool XRayShared = false;
 
 public:
   /// Parses the XRay arguments from an argument list.
@@ -35,6 +36,7 @@ class XRayArgs {
                llvm::opt::ArgStringList &CmdArgs, types::ID InputType) const;
 
   bool needsXRayRt() const { return XRayInstrument && XRayRT; }
+  bool needsXRayDSORt() const { return XRayInstrument && XRayRT && XRayShared; }
   llvm::ArrayRef<std::string> modeList() const { return Modes; }
   XRayInstrSet instrumentationBundle() const { return InstrumentationBundle; }
 };
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 91605a67a37fc0..1c3c8c816594e5 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1623,10 +1623,14 @@ bool tools::addSanitizerRuntimes(const ToolChain &TC, const ArgList &Args,
 }
 
 bool tools::addXRayRuntime(const ToolChain&TC, const ArgList &Args, ArgStringList &CmdArgs) {
-  if (Args.hasArg(options::OPT_shared))
-    return false;
-
-  if (TC.getXRayArgs().needsXRayRt()) {
+  if (Args.hasArg(options::OPT_shared)) {
+    if (TC.getXRayArgs().needsXRayDSORt()) {
+      CmdArgs.push_back("--whole-archive");
+      CmdArgs.push_back(TC.getCompilerRTArgString(Args, "xray-dso"));
+      CmdArgs.push_back("--no-whole-archive");
+      return true;
+    }
+  } else if (TC.getXRayArgs().needsXRayRt()) {
     CmdArgs.push_back("--whole-archive");
     CmdArgs.push_back(TC.getCompilerRTArgString(Args, "xray"));
     for (const auto &Mode : TC.getXRayArgs().modeList())
diff --git a/clang/lib/Driver/XRayArgs.cpp b/clang/lib/Driver/XRayArgs.cpp
index 8c5134e2501358..d0bb5d4887c184 100644
--- a/clang/lib/Driver/XRayArgs.cpp
+++ b/clang/lib/Driver/XRayArgs.cpp
@@ -63,6 +63,23 @@ XRayArgs::XRayArgs(const ToolChain &TC, const ArgList &Args) {
         << XRayInstrument->getSpelling() << Triple.str();
   }
 
+  if (Args.hasFlag(options::OPT_fxray_shared, options::OPT_fno_xray_shared,
+                   false)) {
+    XRayShared = true;
+
+    // DSO instrumentation is currently limited to x86_64
+    if (Triple.getArch() != llvm::Triple::x86_64) {
+      D.Diag(diag::err_drv_unsupported_opt_for_target)
+          << "-fxray-shared" << Triple.str();
+    }
+
+    unsigned PICLvl = std::get<1>(tools::ParsePICArgs(TC, Args));
+    if (!PICLvl) {
+      D.Diag(diag::err_opt_not_valid_without_opt) << "-fxray-shared"
+                                                  << "-fPIC";
+    }
+  }
+
   // Both XRay and -fpatchable-function-entry use
   // TargetOpcode::PATCHABLE_FUNCTION_ENTER.
   if (Arg *A = Args.getLastArg(options::OPT_fpatchable_function_entry_EQ))
@@ -177,6 +194,10 @@ void XRayArgs::addArgs(const ToolChain &TC, const ArgList &Args,
   Args.addOptOutFlag(CmdArgs, options::OPT_fxray_function_index,
                      options::OPT_fno_xray_function_index);
 
+  if (XRayShared)
+    Args.addOptInFlag(CmdArgs, options::OPT_fxray_shared,
+                      options::OPT_fno_xray_shared);
+
   if (const Arg *A =
           Args.getLastArg(options::OPT_fxray_instruction_threshold_EQ)) {
     int Value;
diff --git a/clang/test/Driver/XRay/xray-shared.cpp b/clang/test/Driver/XRay/xray-shared.cpp
new file mode 100644
index 00000000000000..215854e1fc7cef
--- /dev/null
+++ b/clang/test/Driver/XRay/xray-shared.cpp
@@ -0,0 +1,17 @@
+// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fPIC -fxray-instrument -fxray-shared -c %s -o /dev/null 2>&1 | FileCheck %s
+// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fpic -fxray-instrument -fxray-shared -c %s -o /dev/null 2>&1 | FileCheck %s
+// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fxray-instrument -fxray-shared -c %s -o /dev/null 2>&1 | FileCheck %s
+// RUN: not %clang -### --target=x86_64-unknown-linux-gnu -fno-PIC -fxray-instrument -fxray-shared -c %s -o /dev/null 2>&1 | FileCheck %s --check-prefix=ERR-PIC
+// RUN: not %clang -### --target=x86_64-unknown-linux-gnu -fno-pic -fxray-instrument -fxray-shared -c %s -o /dev/null 2>&1 | FileCheck %s --check-prefix=ERR-PIC
+
+// On 64 bit darwin, PIC is always enabled
+// RUN: %clang -### --target=x86_64-apple-darwin -fxray-instrument -fxray-shared -c %s -o /dev/null 2>&1 | FileCheck %s
+
+// Check unsupported targets
+// RUN: not %clang -### --target=aarch64-pc-freebsd -fPIC -fxray-instrument -fxray-shared -c %s -o /dev/null 2>&1 | FileCheck %s --check-prefix=ERR-TARGET
+// RUN: not %clang -### --target=arm64-apple-macos -fPIC -fxray-instrument -fxray-shared -c %s -o /dev/null 2>&1 | FileCheck %s --check-prefix=ERR-TARGET
+
+// CHECK: "-cc1" {{.*}}"-fxray-instrument" {{.*}}"-fxray-shared"
+// ERR-TARGET:   error: unsupported option '-fxray-shared' for target
+// ERR-PIC:   error: option '-fxray-shared' cannot be specified without '-fPIC'
+
diff --git a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
index d00d39518104bf..fb4dfa7bd09dfe 100644
--- a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
+++ b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
@@ -104,6 +104,7 @@ else()
 set(ALL_XRAY_SUPPORTED_ARCH ${X86_64} ${ARM32} ${ARM64} ${MIPS32} ${MIPS64}
 		powerpc64le ${HEXAGON} ${LOONGARCH64})
 endif()
+set(ALL_XRAY_DSO_SUPPORTED_ARCH ${X86_64})
 set(ALL_SHADOWCALLSTACK_SUPPORTED_ARCH ${ARM64})
 
 if (UNIX)
diff --git a/compiler-rt/cmake/config-ix.cmake b/compiler-rt/cmake/config-ix.cmake
index a494e0532a50bc..431f544e8ad6a7 100644
--- a/compiler-rt/cmake/config-ix.cmake
+++ b/compiler-rt/cmake/config-ix.cmake
@@ -668,6 +668,9 @@ if(APPLE)
   list_intersect(XRAY_SUPPORTED_ARCH
     ALL_XRAY_SUPPORTED_ARCH
     SANITIZER_COMMON_SUPPORTED_ARCH)
+  list_intersect(XRAY_DSO_SUPPORTED_ARCH
+    ALL_XRAY_DSO_SUPPORTED_ARCH
+    SANITIZER_COMMON_SUPPORTED_ARCH)
   list_intersect(SHADOWCALLSTACK_SUPPORTED_ARCH
     ALL_SHADOWCALLSTACK_SUPPORTED_ARCH
     SANITIZER_COMMON_SUPPORTED_ARCH)
@@ -702,6 +705,7 @@ else()
   filter_available_targets(CFI_SUPPORTED_ARCH ${ALL_CFI_SUPPORTED_ARCH})
   filter_available_targets(SCUDO_STANDALONE_SUPPORTED_ARCH ${ALL_SCUDO_STANDALONE_SUPPORTED_ARCH})
   filter_available_targets(XRAY_SUPPORTED_ARCH ${ALL_XRAY_SUPPORTED_ARCH})
+  filter_available_targets(XRAY_DSO_SUPPORTED_ARCH ${ALL_XRAY_DSO_SUPPORTED_ARCH})
   filter_available_targets(SHADOWCALLSTACK_SUPPORTED_ARCH
     ${ALL_SHADOWCALLSTACK_SUPPORTED_ARCH})
   filter_available_targets(GWP_ASAN_SUPPORTED_ARCH ${ALL_GWP_ASAN_SUPPORTED_ARCH})
diff --git a/compiler-rt/include/xray/xray_interface.h b/compiler-rt/include/xray/xray_interface.h
index 727431c04e4f73..675ea0cbc48c83 100644
--- a/compiler-rt/include/xray/xray_interface.h
+++ b/compiler-rt/include/xray/xray_interface.h
@@ -93,31 +93,78 @@ enum XRayPatchingStatus {
   FAILED = 3,
 };
 
-/// This tells XRay to patch the instrumentation points. See XRayPatchingStatus
-/// for possible result values.
+/// This tells XRay to patch the instrumentation points in all currently loaded
+/// objects. See XRayPatchingStatus for possible result values.
 extern XRayPatchingStatus __xray_patch();
 
+/// This tells XRay to patch the instrumentation points in the given object.
+/// See XRayPatchingStatus for possible result values.
+extern XRayPatchingStatus __xray_patch_object(int32_t ObjId);
+
 /// Reverses the effect of __xray_patch(). See XRayPatchingStatus for possible
 /// result values.
 extern XRayPatchingStatus __xray_unpatch();
 
-/// This patches a specific function id. See XRayPatchingStatus for possible
+/// Reverses the effect of __xray_patch_object. See XRayPatchingStatus for
+/// possible result values.
+extern XRayPatchingStatus __xray_unpatch_object(int32_t ObjId);
+
+/// This unpacks the given (packed) function id and patches
+/// the corresponding function.  See XRayPatchingStatus for possible
 /// result values.
 extern XRayPatchingStatus __xray_patch_function(int32_t FuncId);
 
-/// This unpatches a specific function id. See XRayPatchingStatus for possible
+/// This patches a specific function in the given object. See XRayPatchingStatus
+/// for possible result values.
+extern XRayPatchingStatus __xray_patch_function_in_object(int32_t FuncId,
+                                                          int32_t ObjId);
+
+/// This unpacks the given (packed) function id and unpatches
+/// the corresponding function. See XRayPatchingStatus for possible
 /// result values.
 extern XRayPatchingStatus __xray_unpatch_function(int32_t FuncId);
 
-/// This function returns the address of the function provided a valid function
-/// id. We return 0 if we encounter any error, even if 0 may be a valid function
-/// address.
+/// This unpatches a specific function in the given object.
+/// See XRayPatchingStatus for possible result values.
+extern XRayPatchingStatus __xray_unpatch_function_in_object(int32_t FuncId,
+                                                            int32_t ObjId);
+
+/// This function unpacks the given (packed) function id and returns the address
+/// of the corresponding function. We return 0 if we encounter any error, even
+/// if 0 may be a valid function address.
 extern uintptr_t __xray_function_address(int32_t FuncId);
 
-/// This function returns the maximum valid function id. Returns 0 if we
-/// encounter errors (when there are no instrumented functions, etc.).
+/// This function returns the address of the function in the given object
+/// provided valid function and object ids. We return 0 if we encounter any
+/// error, even if 0 may be a valid function address.
+extern uintptr_t __xray_function_address_in_object(int32_t FuncId,
+                                                   int32_t ObjId);
+
+/// This function returns the maximum valid function id for the main executable
+/// (object id = 0). Returns 0 if we encounter errors (when there are no
+/// instrumented functions, etc.).
 extern size_t __xray_max_function_id();
 
+/// This function returns the maximum valid function id for the given object.
+/// Returns 0 if we encounter errors (when there are no instrumented functions,
+/// etc.).
+extern size_t __xray_max_function_id_in_object(int32_t ObjId);
+
+/// This function returns the number of previously registered objects
+/// (executable + loaded DSOs). Returns 0 if XRay has not been initialized.
+extern size_t __xray_num_objects();
+
+/// Unpacks the function id from the given packed id.
+extern int32_t __xray_unpack_function_id(int32_t PackedId);
+
+/// Unpacks the object id from the given packed id.
+extern int32_t __xray_unpack_object_id(int32_t PackedId);
+
+/// Creates and returns a packed id from the given function and object ids.
+/// If the ids do not fit within the reserved number of bits for each part, the
+/// high bits are truncated.
+extern int32_t __xray_pack_id(int32_t FuncId, int32_t ObjId);
+
 /// Initialize the required XRay data structures. This is useful in cases where
 /// users want to control precisely when the XRay instrumentation data
 /// structures are initialized, for example when the XRay library is built with
diff --git a/compiler-rt/lib/xray/CMakeLists.txt b/compiler-rt/lib/xray/CMakeLists.txt
index cf7b5062aae32d..f38c07420c9abf 100644
--- a/compiler-rt/lib/xray/CMakeLists.txt
+++ b/compiler-rt/lib/xray/CMakeLists.txt
@@ -10,6 +10,10 @@ set(XRAY_SOURCES
   xray_utils.cpp
   )
 
+set(XRAY_DSO_SOURCES
+  xray_dso_init.cpp
+  )
+
 # Implementation files for all XRay modes.
 set(XRAY_FDR_MODE_SOURCES
   xray_fdr_flags.cpp
@@ -33,6 +37,11 @@ set(x86_64_SOURCES
   xray_trampoline_x86_64.S
   )
 
+set(x86_64_DSO_SOURCES
+   xray_trampoline_x86_64.S
+   )
+
+
 set(arm_SOURCES
   xray_arm.cpp
   xray_trampoline_arm.S
@@ -128,10 +137,12 @@ set(XRAY_IMPL_HEADERS
 # consumption by tests.
 set(XRAY_ALL_SOURCE_FILES
   ${XRAY_SOURCES}
+  ${XRAY_DSO_SOURCES}
   ${XRAY_FDR_MODE_SOURCES}
   ${XRAY_BASIC_MODE_SOURCES}
   ${XRAY_PROFILING_MODE_SOURCES}
   ${x86_64_SOURCES}
+  ${x86_64_DSO_SOURCES}
   ${arm_SOURCES}
   ${armhf_SOURCES}
   ${hexagon_SOURCES}
@@ -162,6 +173,9 @@ set(XRAY_CFLAGS
   ${COMPILER_RT_CXX_CFLAGS})
 set(XRAY_COMMON_DEFINITIONS SANITIZER_COMMON_NO_REDEFINE_BUILTINS XRAY_HAS_EXCEPTIONS=1)
 
+# DSO trampolines need to be compiled with GOT addressing
+set(XRAY_COMMON_DEFINITIONS_DSO ${XRAY_COMMON_DEFINITIONS} XRAY_PIC)
+
 # Too many existing bugs, needs cleanup.
 append_list_if(COMPILER_RT_HAS_WNO_FORMAT -Wno-format XRAY_CFLAGS)
 
@@ -201,7 +215,16 @@ if (APPLE)
     CFLAGS ${XRAY_CFLAGS}
     DEFS ${XRAY_COMMON_DEFINITIONS}
     DEPS ${XRAY_DEPS})
+  add_compiler_rt_object_libraries(RTXrayDSO
+    OS ${XRAY_SUPPORTED_OS}
+    ARCHS ${XRAY_DSO_SUPPORTED_ARCH}
+    SOURCES ${XRAY_DSO_SOURCES}
+    ADDITIONAL_HEADERS ${XRAY_IMPL_HEADERS}
+    CFLAGS ${XRAY_CFLAGS}
+    DEFS ${XRAY_COMMON_DEFINITIONS_DSO}
+    DEPS ${XRAY_DEPS})
   set(XRAY_RTXRAY_ARCH_LIBS "")
+  set(XRAY_DSO_RTXRAY_ARCH_LIBS "")
   foreach(arch ${XRAY_SUPPORTED_ARCH})
     if(NOT ${arch} IN_LIST XRAY_SOURCE_ARCHS)
       continue()
@@ -215,6 +238,17 @@ if (APPLE)
       DEFS ${XRAY_COMMON_DEFINITIONS}
       DEPS ${XRAY_DEPS})
     list(APPEND XRAY_RTXRAY_ARCH_LIBS RTXray_${arch})
+    if (${arch} IN_LIST XRAY_DSO_SUPPORTED_ARCH)
+      add_compiler_rt_object_libraries(RTXrayDSO_${arch}
+        OS ${XRAY_SUPPORTED_OS}
+        ARCHS ${XRAY_DSO_SUPPORTED_ARCH}
+        SOURCES ${${arch}_DSO_SOURCES}
+        ADDITIONAL_HEADERS ${XRAY_IMPL_HEADERS}
+        CFLAGS ${XRAY_CFLAGS}
+        DEFS ${XRAY_COMMON_DEFINITIONS_DSO}
+        DEPS ${XRAY_DEPS})
+      list(APPEND XRAY_DSO_RTXRAY_ARCH_LIBS RTXrayDSO_${arch})
+    endif()
   endforeach()
   add_compiler_rt_object_libraries(RTXrayFDR
     OS ${XRAY_SUPPORTED_OS}
@@ -252,6 +286,17 @@ if (APPLE)
     LINK_FLAGS ${XRAY_LINK_FLAGS} ${WEAK_SYMBOL_LINK_FLAGS}
     LINK_LIBS ${XRAY_LINK_LIBS}
     PARENT_TARGET xray)
+  add_compiler_rt_runtime(clang_rt.xray-dso
+    STATIC
+    OS ${XRAY_SUPPORTED_OS}
+    ARCHS ${XRAY_DSO_SUPPORTED_ARCH}
+    OBJECT_LIBS RTXrayDSO ${XRAY_DSO_RTXRAY_ARCH_LIBS}
+    CFLAGS ${XRAY_CFLAGS}
+    DEFS ${XRAY_COMMON_DEFINITIONS}
+    LINK_FLAGS ${XRAY_LINK_FLAGS} ${WEAK_SYMBOL_LINK_FLAGS}
+    LINK_LIBS ${XRAY_LINK_LIBS}
+    PARENT_TARGET xray)
+
   add_compiler_rt_runtime(clang_rt.xray-fdr
     STATIC
     OS ${XRAY_SUPPORTED_OS}
@@ -346,16 +391,37 @@ else() # not Apple
       DEFS ${XRAY_COMMON_DEFINITIONS}
       OBJECT_LIBS RTXrayBASIC
       PARENT_TARGET xray)
-   # Profiler Mode runtime
-   add_compiler_rt_runtime(clang_rt.xray-profiling
-     STATIC
-     ARCHS ${arch}
-     CFLAGS ${XRAY_CFLAGS}
-     LINK_FLAGS ${XRAY_LINK_FLAGS}
-     LINK_LIBS ${XRAY_LINK_LIBS}
-     DEFS ${XRAY_COMMON_DEFINITIONS}
-     OBJECT_LIBS RTXrayPROFILING
-     PARENT_TARGET xray)
+    # Profiler Mode runtime
+    add_compiler_rt_runtime(clang_rt.xray-profiling
+      STATIC
+      ARCHS ${arch}
+      CFLAGS ${XRAY_CFLAGS}
+      LINK_FLAGS ${XRAY_LINK_FLAGS}
+      LINK_LIBS ${XRAY_LINK_LIBS}
+      DEFS ${XRAY_COMMON_DEFINITIONS}
+      OBJECT_LIBS RTXrayPROFILING
+      PARENT_TARGET xray)
+
+    if (${arch} IN_LIST XRAY_DSO_SUPPORTED_ARCH)
+      # TODO: Only implemented for X86 at the moment
+      add_compiler_rt_object_libraries(RTXrayDSO
+        ARCHS ${arch}
+        SOURCES ${XRAY_DSO_SOURCES} ${${arch}_DSO_SOURCES} 
+        ADDITIONAL_HEADERS ${XRAY_IMPL_HEADERS}
+        CFLAGS ${XRAY_CFLAGS}
+        DEFS ${XRAY_COMMON_DEFINITIONS_DSO}
+        DEPS ${XRAY_DEPS})
+      # DSO runtime archive
+      add_compiler_rt_runtime(clang_rt.xray-dso
+        STATIC
+        ARCHS ${arch}
+        CFLAGS ${XRAY_CFLAGS}
+        LINK_FLAGS ${XRAY_LINK_FLAGS}
+        LINK_LIBS ${XRAY_LINK_LIBS}
+        DEFS ${XRAY_COMMON_DEFINITIONS}
+        OBJECT_LIBS RTXrayDSO
+        PARENT_TARGET xray)
+    endif()
   endforeach()
 endif() # not Apple
 
diff --git a/compiler-rt/lib/xray/xray_AArch64.cpp b/compiler-rt/lib/xray/xray_AArch64.cpp
index c1d77758946edf..2b151162ce4b9b 100644
--- a/compiler-rt/lib/xray/xray_AArch64.cpp
+++ b/compiler-rt/lib/xray/xray_AArch64.cpp
@@ -89,18 +89,23 @@ inline static bool patchSled(const bool Enable, const uint32_t FuncId,
 
 bool patchFunctionEntry(const bool Enable, const uint32_t FuncId,
                         const XRaySledEntry &Sled,
-                        void (*Trampoline)()) XRAY_NEVER_INSTRUMENT {
+                        const XRayTrampolines &Trampolines,
+                        bool LogArgs) XRAY_NEVER_INSTRUMENT {
+  auto Trampoline =
+      LogArgs ? Trampolines.LogArgsTrampoline : Trampolines.EntryTrampoline;
   return patchSled(Enable, FuncId, Sled, Trampoline);
 }
 
-bool patchFunctionExit(const bool Enable, const uint32_t FuncId,
-                       const XRaySledEntry &Sled) XRAY_NEVER_INSTRUMENT {
-  return patchSled(Enable, FuncId, Sled, __xray_FunctionExit);
+bool patchFunctionExit(
+    const bool Enable, const uint32_t FuncId, const XRaySledEntry &Sled,
+    const XRayTrampolines &Trampolines) XRAY_NEVER_INSTRUMENT {
+  return patchSled(Enable, FuncId, Sled, Trampolines.ExitTrampoline);
 }
 
-bool patchFunctionTailExit(const bool Enable, const uint32_t FuncId,
-                           const XRaySledEntry &Sled) XRAY_NEVER_INSTRUMENT {
-  return patchSled(Enable, FuncId, Sled, __xray_FunctionTailExit);
+bool patchFunctionTailExit(
+    const bool Enable, const uint32_t FuncId, const XRaySledEntry &Sled,
+    const XRayTrampolines &Trampolines) XRAY_NEVER_INSTRUMENT {
+  return patchSled(Enable, FuncId, Sled, Trampolines.TailExitTrampoline);
 }
 
 // AArch64AsmPrinter::LowerPATCHABLE_EVENT_CALL generates this code sequence:
diff --git a/compiler-rt/lib/xray/xray_arm.cpp b/compiler-rt/lib/xray/xray_arm.cpp
index e1818555906c35..e318bb7070e802 100644
--- a/compiler-rt/lib/xray/xray_arm.cpp
+++ b/compiler-rt/lib/xray/xray_arm.cpp
@@ -128,18 +128,23 @@ inline static bool patchSled(const bool Enable, const uint32_t FuncId,
 
 bool patchFunctionEntry(const bool Enable, const uint32_t FuncId,
                         const XRaySledEntry &Sled,
-                        void (*Trampoline)()) XRAY_NEVER_INSTRUMENT {
+                        const XRayTrampolines &Trampolines,
+                        bool LogArgs) XRAY_NEVER_INSTRUMENT {
+  auto Trampoline =
+      LogArgs ? Trampolines.LogArgsTrampoline : Trampolines.EntryTrampoline;
   return patchSled(Enable, FuncId, Sled, Trampoline);
 }
 
-bool patchFunctionExit(const bool Enable, const uint32_t FuncId,
-                       const XRaySledEntry &Sled) XRAY_NEVER_INSTRUMENT {
-  return patchSled(Enable, FuncId, Sled, __xray_FunctionExit);
+bool patchFunctionExit(
+    const bool Enable, const uint32_t FuncId, const XRaySledEntry &Sled,
+    c...
[truncated]

@sebastiankreutzer
Copy link
Contributor Author

@jplehr

Copy link

github-actions bot commented Oct 18, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

nikic and others added 21 commits October 21, 2024 12:47
GEP offsets have sext_or_trunc semantics. We were already doing
this for the outer-most GEP, but not for the inner ones.

I believe one of the sanitizer buildbot failures was due to this,
but I did not manage to reproduce the issue or come up with a
test case. Usually the problematic case will already be folded
away due to index type canonicalization.
bazelbuild: fix for
llvm@2ce10f0.

No functional changes intended.
Fix for
llvm@d80b9cf.

No functional changes intended.
…en issue

Since llvm#109628 landed, this test
has been failing on 32-bit Arm.

This is due to a codegen problem (whether added or uncovered by the change,
not known) where the trap instruction is placed after the frame pointer
and link register are restored.

llvm#113154

So the code was:
```
std::__1::vector<int>::operator[](unsigned int):
  sub sp, sp, llvm#8
  str r0, [sp, llvm#4]
  str r1, [sp]
  add sp, sp, llvm#8
  .inst 0xe7ffdefe
  bx lr
```
When lldb saw the trap, the PC was inside operator[] but the frame
information actually pointed to g.

This bug only happens for leaf functions so adding a return type
works around it:
```
std::__1::vector<int>::operator[](unsigned int):
  push {r11, lr}
  mov r11, sp
  sub sp, sp, llvm#8
  str r0, [sp, llvm#4]
  str r1, [sp]
  mov sp, r11
  pop {r11, lr}
  .inst 0xe7ffdefe
  bx lr
```
(and operator[] should return T& anyway)

Now the PC location and frame information should match and the
test passes.
…xt parameter to getBufferForFile (llvm#111723)

This patch adds an IsText parameter to the following getBufferForFile,
getBufferForFileImpl. We introduce a new virtual function
openFileForReadBinary which defaults to openFileForRead except in
RealFileSystem which uses the OF_None flag instead of OF_Text.

The default is set to OF_Text instead of OF_None, this change in value
does not affect any other platforms other than z/OS. Setting this
parameter correctly is required to open files on z/OS in the correct
encoding. The IsText parameter is based on the context of where we open
files, for example, in the ASTReader, HeaderMap requires that files
always be opened in binary even though they might be tagged as text.
…) for reductions"

This reverts commit 7f2e937 as it causes
regressions in the tests it modifies, and undoes what was added in llvm#100653
(which itself was a fix for a previous regression).
The initial version of this feature would use the output file name
if it could, but in switching to temp files I forgot to replicate that
behaviour.

What happens now is we always use a tempfile name and the output
path is a template for that. I think the current behaviour
still makes sense so I'm just correcting the documentation.
This code is heavily based on the SelectionDAG lowerINSERT_SUBVECTOR
code.
The SPIR-V backend will need to use Reg2Mem, hence this pass needs to be
wrapped to be used with the legacy pass manager.

---------

Signed-off-by: Nathan Gauër <[email protected]>
…ace tablegen patterns

This lowers the aarch64_neon_sqxtn intrinsics to the new TRUNCATE_SSAT_S ISD
nodes, performing the same for sqxtun and uqxtn. This allows us to clean up the
tablegen patterns a little and in a future commit add combines for sqxtn.
…llvm#112686)

Currently, the `omp.simd` operation is ignored during MLIR to LLVM IR
translation when it takes part in a composite construct. One consequence
of this limitation is that any entry block arguments defined by that
operation will trigger a compiler crash if they are used anywhere, as
they are not bound to an LLVM IR value.

A previous PR introducing support for the `reduction` clause resulted in
the creation and use of entry block arguments attached to the `omp.simd`
operation, causing compiler crashes on 'do simd reduction(...)'
constructs.

This patch disables Flang lowering of simd reductions in 'do simd'
constructs to avoid triggering these errors while translation to LLVM IR
is still incomplete.
llvm#113119)

StringMap::find takes StringRef.  We don't need to create an instance
of std::string from StringRef only to convert it right back to
StringRef.
This code intentionally discards the high bits, so set
implicitTrunc=true. This is currently NFC but will enable an
APInt assertion in the future.
…vm#111575)

Adds a new mlir-opt test-only pass, -test-spirv-cpu-runner-pipeline,
which runs the set of MLIR passes needed for the mlir-spirv-cpu-runner,
and removes them from the runner. The tests are changed to invoke
mlir-opt with this flag before running the runner. The eventual goal is
to move all host/device code generation steps out of the runner, like
with some of the other runners.
6bac414 added this opcode with the wrong
number of operands. It didn't fail on check-llvm for me or on pre-commit CI,
but once committed we got buildbot failures. This patch fixes the definition
of the instruction and fixes the failing test.
With the truncssat nodes these are relatively simple tablegen patterns to add.
The existing intrinsics are converted to shift+truncsat to they can lower using
the new patterns.

Fixes llvm#112925.
@sebastiankreutzer
Copy link
Contributor Author

For some reason, rebasing on main caused github to automatically request lots of reviews. Not sure why this is happening now.

@sebastiankreutzer
Copy link
Contributor Author

Closing this PR, as something got mixed up during the rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category compiler-rt xray
Projects
None yet
Development

Successfully merging this pull request may close these issues.