Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spill/restore FP/BP around instructions in which they are clobbered #81048

Merged
merged 8 commits into from
Aug 6, 2024

Conversation

weiguozhi
Copy link
Contributor

This patch fixes #17204.

If a base pointer is used in a function, and it is clobbered by an instruction (typically an inline asm), current register allocator can't handle this situation, so BP becomes garbage after those instructions. It can also occur to FP in theory.

We can spill and reload FP/BP registers around those instructions. But normal spill/reload instructions also use FP/BP, so we can't spill them into normal spill slots, instead we spill them into the top of stack by using SP register.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 7, 2024

@llvm/pr-subscribers-backend-x86

Author: None (weiguozhi)

Changes

This patch fixes #17204.

If a base pointer is used in a function, and it is clobbered by an instruction (typically an inline asm), current register allocator can't handle this situation, so BP becomes garbage after those instructions. It can also occur to FP in theory.

We can spill and reload FP/BP registers around those instructions. But normal spill/reload instructions also use FP/BP, so we can't spill them into normal spill slots, instead we spill them into the top of stack by using SP register.


Patch is 20.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81048.diff

10 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineFrameInfo.h (+1)
  • (modified) llvm/include/llvm/CodeGen/TargetFrameLowering.h (+25)
  • (modified) llvm/include/llvm/CodeGen/TargetRegisterInfo.h (+10)
  • (modified) llvm/lib/CodeGen/PrologEpilogInserter.cpp (+124)
  • (modified) llvm/lib/Target/X86/X86FrameLowering.cpp (+111)
  • (modified) llvm/lib/Target/X86/X86FrameLowering.h (+14)
  • (modified) llvm/lib/Target/X86/X86RegisterInfo.h (+6)
  • (added) llvm/test/CodeGen/X86/clobber_base_ptr.ll (+59)
  • (modified) llvm/test/CodeGen/X86/i386-baseptr.ll (+4)
  • (modified) llvm/test/CodeGen/X86/x86-64-baseptr.ll (+12)
diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
index 7d11d63d4066f..f4578a6ca6a11 100644
--- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
@@ -599,6 +599,7 @@ class MachineFrameInfo {
   /// Return the alignment in bytes that this function must be aligned to,
   /// which is greater than the default stack alignment provided by the target.
   Align getMaxAlign() const { return MaxAlignment; }
+  Align getStackAlignment() const { return StackAlignment; }
 
   /// Make sure the function is at least Align bytes aligned.
   void ensureMaxAlignment(Align Alignment);
diff --git a/llvm/include/llvm/CodeGen/TargetFrameLowering.h b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
index 0b9cacecc7cbe..e8ff1d69b4eb1 100644
--- a/llvm/include/llvm/CodeGen/TargetFrameLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
@@ -466,6 +466,31 @@ class TargetFrameLowering {
   /// Return the frame base information to be encoded in the DWARF subprogram
   /// debug info.
   virtual DwarfFrameBase getDwarfFrameBase(const MachineFunction &MF) const;
+
+  /// Issue instructions to allocate stack space and spill FP and/or BP to stack
+  /// using SP register.
+  virtual void spillFPBPUsingSP(MachineFunction &MF,
+                                const MachineBasicBlock::iterator BeforeMI,
+                                bool SpillFP, bool SpillBP) const {
+    report_fatal_error("spillFPBPUsingSP is not implemented for this target!");
+  }
+
+  /// Issue instructions to restore FP and/or BP from stack using SP register,
+  /// and free stack space.
+  virtual void restoreFPBPUsingSP(MachineFunction &MF,
+                                  const MachineBasicBlock::iterator AfterMI,
+                                  bool SpillFP, bool SpillBP) const {
+    report_fatal_error("restoreFPBPUsingSP is not implemented for this "
+                       "target!");
+  }
+
+  // If MI uses fp/bp, but target can handle it, and doesn't want PEI to spill
+  // it, this function should return true, and update MI so PEI will not check
+  // any instructions from it and later.
+  virtual bool skipSpillFPBP(MachineFunction &MF,
+                             MachineBasicBlock::reverse_iterator &MI) const {
+    return false;
+  }
 };
 
 } // End llvm namespace
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 5098fc68df3b2..f6de9c40335e8 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -963,6 +963,16 @@ class TargetRegisterInfo : public MCRegisterInfo {
     return false;
   }
 
+  /// Returns true if the target needs a base register.
+  virtual bool hasBaseRegister(const MachineFunction &MF) const {
+    return false;
+  }
+
+  /// Returns the physical base register used to access stack object.
+  virtual Register getBaseRegister(const MachineFunction &MF) const {
+    return 0;
+  }
+
   /// Return true if target has reserved a spill slot in the stack frame of
   /// the given function for the specified register. e.g. On x86, if the frame
   /// register is required, the first fixed stack object is reserved as its
diff --git a/llvm/lib/CodeGen/PrologEpilogInserter.cpp b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
index 8af17e63e25c7..040a3f4f95186 100644
--- a/llvm/lib/CodeGen/PrologEpilogInserter.cpp
+++ b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
@@ -77,6 +77,15 @@ using MBBVector = SmallVector<MachineBasicBlock *, 4>;
 STATISTIC(NumLeafFuncWithSpills, "Number of leaf functions with CSRs");
 STATISTIC(NumFuncSeen, "Number of functions seen in PEI");
 
+static cl::opt<bool> SpillClobberedFP(
+    "spill-clobbered-fp",
+    cl::desc("Spill clobbered fp register to stack."),
+    cl::init(false), cl::Hidden);
+
+static cl::opt<bool> SpillClobberedBP(
+    "spill-clobbered-bp",
+    cl::desc("Spill clobbered bp register to stack."),
+    cl::init(true), cl::Hidden);
 
 namespace {
 
@@ -122,6 +131,7 @@ class PEI : public MachineFunctionPass {
   void calculateCallFrameInfo(MachineFunction &MF);
   void calculateSaveRestoreBlocks(MachineFunction &MF);
   void spillCalleeSavedRegs(MachineFunction &MF);
+  void spillFrameBasePointer(MachineFunction &MF);
 
   void calculateFrameObjectOffsets(MachineFunction &MF);
   void replaceFrameIndices(MachineFunction &MF);
@@ -228,6 +238,9 @@ bool PEI::runOnMachineFunction(MachineFunction &MF) {
   FrameIndexVirtualScavenging = TRI->requiresFrameIndexScavenging(MF);
   ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();
 
+  // Handle FP and BP spilling and restoring, for instructions that need it.
+  spillFrameBasePointer(MF);
+
   // Calculate the MaxCallFrameSize and AdjustsStack variables for the
   // function's frame information. Also eliminates call frame pseudo
   // instructions.
@@ -1587,3 +1600,114 @@ void PEI::replaceFrameIndices(MachineBasicBlock *BB, MachineFunction &MF,
       ++I;
   }
 }
+
+static bool accessFrameBasePointer(const MachineInstr &MI, Register FP,
+                                   Register BP, bool &AccessFP, bool &AccessBP,
+                                   const TargetRegisterInfo *TRI) {
+  AccessFP = AccessBP = false;
+  if (FP) {
+    if (MI.findRegisterUseOperandIdx(FP, false, TRI) != -1 ||
+        MI.findRegisterDefOperandIdx(FP, false, true, TRI) != -1)
+      AccessFP = true;
+  }
+  if (BP) {
+    if (MI.findRegisterUseOperandIdx(BP, false, TRI) != -1 ||
+        MI.findRegisterDefOperandIdx(BP, false, true, TRI) != -1)
+      AccessBP = true;
+  }
+  return AccessFP || AccessBP;
+}
+
+/// If a function uses base pointer and the base pointer is clobbered by inline
+/// asm, RA doesn't detect this case, and after the inline asm, the base pointer
+/// cotains garbage value.
+/// For example if a 32b x86 function uses base pointer esi, and esi is
+/// clobbered by following inline asm
+///     asm("rep movsb" : "+D"(ptr), "+S"(x), "+c"(c)::"memory");
+/// We need to save esi before the asm and restore it after the asm.
+///
+/// The problem can also occur to frame pointer if there is a function call, and
+/// the callee uses a different calling convention and clobbers the fp.
+///
+/// Because normal frame objects (spill slots) are accessed through fp/bp
+/// register, so we can't spill fp/bp to normal spill slots.
+///
+/// FIXME: There are 2 possible enhancements:
+/// 1. In many cases there are different physical registers not clobbered by
+/// inline asm, we can use one of them as base pointer. Or use a virtual
+/// register as base pointer and let RA allocate a physical register to it.
+/// 2. If there is no other instructions access stack with fp/bp from the
+/// inline asm to the epilog, we can skip the save and restore operations.
+void PEI::spillFrameBasePointer(MachineFunction &MF) {
+  Register FP, BP;
+  const TargetFrameLowering &TFI = *MF.getSubtarget().getFrameLowering();
+  const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
+  if (TFI.hasFP(MF) && SpillClobberedFP)
+    FP = TRI->getFrameRegister(MF);
+  if (TRI->hasBaseRegister(MF) && SpillClobberedBP)
+    BP = TRI->getBaseRegister(MF);
+
+  for (MachineBasicBlock &MBB : MF) {
+    auto MI = MBB.rbegin(), ME = MBB.rend();
+    while (MI != ME) {
+      // Skip frame setup/destroy instructions.
+      // Skip instructions handled by target.
+      if (MI->getFlag(MachineInstr::MIFlag::FrameSetup) ||
+          MI->getFlag(MachineInstr::MIFlag::FrameDestroy) ||
+          TFI.skipSpillFPBP(MF, MI)) {
+        ++MI;
+        continue;
+      }
+
+      bool AccessFP, AccessBP;
+      // Check if fp or bp is used in MI.
+      if (!accessFrameBasePointer(*MI, FP, BP, AccessFP, AccessBP, TRI)) {
+        ++MI;
+        continue;
+      }
+
+      // Look for the range [Start, Stop] in which fp or bp is defined and used.
+      bool FPLive = false, BPLive = false;
+      bool SpillFP = false, SpillBP = false;
+      auto Start = MI, Stop = MI;
+      do {
+        SpillFP |= AccessFP;
+        SpillBP |= AccessBP;
+
+        // Maintain FPLive and BPLive.
+        if (FPLive && MI->findRegisterDefOperandIdx(FP, false, true, TRI) != -1)
+          FPLive = false;
+        if (FP && MI->findRegisterUseOperandIdx(FP, false, TRI) != -1)
+          FPLive = true;
+        if (BPLive && MI->findRegisterDefOperandIdx(BP, false, true, TRI) != -1)
+          BPLive = false;
+        if (BP && MI->findRegisterUseOperandIdx(BP, false, TRI) != -1)
+          BPLive = true;
+
+        Start = MI++;
+      } while ((MI != ME) && (FPLive || BPLive ||
+                accessFrameBasePointer(*MI, FP, BP, AccessFP, AccessBP, TRI)));
+
+      // If the bp is clobbered by a call, we should save and restore outside of
+      // the frame setup instructions.
+      if (Stop->isCall()) {
+        const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
+        auto FrameSetup = std::next(Start);
+        while (FrameSetup != ME && !TII.isFrameSetup(*FrameSetup) &&
+               !FrameSetup->isCall())
+          ++FrameSetup;
+        if (FrameSetup != ME && TII.isFrameSetup(*FrameSetup)) {
+          while (!TII.isFrameInstr(*Stop))
+            --Stop;
+          Start = FrameSetup;
+          MI = Start;
+          ++MI;
+        }
+      }
+
+      // Call target functions to spill and restore FP and BP registers.
+      TFI.spillFPBPUsingSP(MF, &(*Start), SpillFP, SpillBP);
+      TFI.restoreFPBPUsingSP(MF, &(*Stop), SpillFP, SpillBP);
+    }
+  }
+}
diff --git a/llvm/lib/Target/X86/X86FrameLowering.cpp b/llvm/lib/Target/X86/X86FrameLowering.cpp
index be416fb0db069..1fe874623bede 100644
--- a/llvm/lib/Target/X86/X86FrameLowering.cpp
+++ b/llvm/lib/Target/X86/X86FrameLowering.cpp
@@ -4178,3 +4178,114 @@ void X86FrameLowering::restoreWinEHStackPointersInParent(
                                   /*RestoreSP=*/IsSEH);
   }
 }
+
+static int computeSPAdjust4SpillFPBP(MachineFunction &MF,
+                                     const TargetRegisterClass *RC,
+                                     unsigned SpillRegNum) {
+  const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
+  unsigned AllocSize = TRI->getSpillSize(*RC) * SpillRegNum;
+  Align StackAlign = MF.getFrameInfo().getStackAlignment();
+  unsigned AlignedSize = alignTo(AllocSize, StackAlign);
+  return AlignedSize - AllocSize;
+}
+
+void X86FrameLowering::spillFPBPUsingSP(
+    MachineFunction &MF, MachineBasicBlock::iterator BeforeMI,
+    bool SpillFP, bool SpillBP) const {
+  const TargetRegisterClass *RC;
+  unsigned RegNum = 0;
+  MachineBasicBlock *MBB = BeforeMI->getParent();
+  DebugLoc DL = BeforeMI->getDebugLoc();
+
+  // Spill FP.
+  if (SpillFP) {
+    Register FP = TRI->getFrameRegister(MF);
+    if (STI.isTarget64BitILP32())
+      FP = Register(getX86SubSuperRegister(FP, 64));
+    RC = TRI->getMinimalPhysRegClass(FP);
+    ++RegNum;
+
+    BuildMI(*MBB, BeforeMI, DL,
+            TII.get(getPUSHOpcode(MF.getSubtarget<X86Subtarget>())))
+        .addReg(FP);
+  }
+
+  // Spill BP.
+  if (SpillBP) {
+    Register BP = TRI->getBaseRegister(MF);
+    if (STI.isTarget64BitILP32())
+      BP = Register(getX86SubSuperRegister(BP, 64));
+    RC = TRI->getMinimalPhysRegClass(BP);
+    ++RegNum;
+
+    BuildMI(*MBB, BeforeMI, DL,
+            TII.get(getPUSHOpcode(MF.getSubtarget<X86Subtarget>())))
+        .addReg(BP);
+  }
+
+  // Make sure SP is aligned.
+  int SPAdjust = computeSPAdjust4SpillFPBP(MF, RC, RegNum);
+  if (SPAdjust)
+    emitSPUpdate(*MBB, BeforeMI, DL, -SPAdjust, false);
+}
+
+void X86FrameLowering::restoreFPBPUsingSP(
+    MachineFunction &MF, MachineBasicBlock::iterator AfterMI,
+    bool SpillFP, bool SpillBP) const {
+  Register FP, BP;
+  const TargetRegisterClass *RC;
+  unsigned RegNum = 0;
+  if (SpillFP) {
+    FP = TRI->getFrameRegister(MF);
+    if (STI.isTarget64BitILP32())
+      FP = Register(getX86SubSuperRegister(FP, 64));
+    RC = TRI->getMinimalPhysRegClass(FP);
+    ++RegNum;
+  }
+  if (SpillBP) {
+    BP = TRI->getBaseRegister(MF);
+    if (STI.isTarget64BitILP32())
+      BP = Register(getX86SubSuperRegister(BP, 64));
+    RC = TRI->getMinimalPhysRegClass(BP);
+    ++RegNum;
+  }
+
+  // Adjust SP so it points to spilled FP or BP.
+  MachineBasicBlock *MBB = AfterMI->getParent();
+  MachineBasicBlock::iterator Pos = std::next(AfterMI);
+  DebugLoc DL = AfterMI->getDebugLoc();
+  int SPAdjust = computeSPAdjust4SpillFPBP(MF, RC, RegNum);
+  if (SPAdjust)
+    emitSPUpdate(*MBB, Pos, DL, SPAdjust, false);
+
+  // Restore BP.
+  if (SpillBP) {
+    BuildMI(*MBB, Pos, DL,
+            TII.get(getPOPOpcode(MF.getSubtarget<X86Subtarget>())), BP);
+  }
+
+  // Restore FP.
+  if (SpillFP) {
+    BuildMI(*MBB, Pos, DL,
+            TII.get(getPOPOpcode(MF.getSubtarget<X86Subtarget>())), FP);
+  }
+}
+
+bool X86FrameLowering::skipSpillFPBP(
+    MachineFunction &MF, MachineBasicBlock::reverse_iterator &MI) const {
+  if (MI->getOpcode() == X86::LCMPXCHG16B_SAVE_RBX) {
+    // The pseudo instruction LCMPXCHG16B_SAVE_RBX is generated in the form
+    //     SaveRbx = COPY RBX
+    //     SaveRbx = LCMPXCHG16B_SAVE_RBX ..., SaveRbx, implicit-def rbx
+    // And later LCMPXCHG16B_SAVE_RBX is expanded to restore RBX from SaveRbx.
+    // We should skip this instruction sequence.
+    int FI;
+    unsigned Reg;
+    while (!(MI->getOpcode() == TargetOpcode::COPY &&
+             MI->getOperand(1).getReg() == X86::RBX) &&
+           !((Reg = TII.isStoreToStackSlot(*MI, FI)) && Reg == X86::RBX))
+      ++MI;
+    return true;
+  }
+  return false;
+}
diff --git a/llvm/lib/Target/X86/X86FrameLowering.h b/llvm/lib/Target/X86/X86FrameLowering.h
index 2dc9ecc6109d7..0d77933363ce4 100644
--- a/llvm/lib/Target/X86/X86FrameLowering.h
+++ b/llvm/lib/Target/X86/X86FrameLowering.h
@@ -103,6 +103,20 @@ class X86FrameLowering : public TargetFrameLowering {
                               MutableArrayRef<CalleeSavedInfo> CSI,
                               const TargetRegisterInfo *TRI) const override;
 
+  void
+  spillFPBPUsingSP(MachineFunction &MF,
+                   const MachineBasicBlock::iterator BeforeMI,
+                   bool SpillFP, bool SpillBP) const override;
+
+  void
+  restoreFPBPUsingSP(MachineFunction &MF,
+                     const MachineBasicBlock::iterator AfterMI,
+                     bool SpillFP, bool SpillBP) const override;
+
+  bool
+  skipSpillFPBP(MachineFunction &MF,
+                MachineBasicBlock::reverse_iterator &MI) const override;
+
   bool hasFP(const MachineFunction &MF) const override;
   bool hasReservedCallFrame(const MachineFunction &MF) const override;
   bool canSimplifyCallFramePseudos(const MachineFunction &MF) const override;
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.h b/llvm/lib/Target/X86/X86RegisterInfo.h
index 7296a5f021e4a..cccb731d24354 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.h
+++ b/llvm/lib/Target/X86/X86RegisterInfo.h
@@ -133,6 +133,9 @@ class X86RegisterInfo final : public X86GenRegisterInfo {
   void adjustStackMapLiveOutMask(uint32_t *Mask) const override;
 
   bool hasBasePointer(const MachineFunction &MF) const;
+  bool hasBaseRegister(const MachineFunction &MF) const override {
+    return hasBasePointer(MF);
+  }
 
   bool canRealignStack(const MachineFunction &MF) const override;
 
@@ -164,6 +167,9 @@ class X86RegisterInfo final : public X86GenRegisterInfo {
   unsigned getPtrSizedStackRegister(const MachineFunction &MF) const;
   Register getStackRegister() const { return StackPtr; }
   Register getBaseRegister() const { return BasePtr; }
+  Register getBaseRegister(const MachineFunction &MF) const override {
+    return BasePtr;
+  }
   /// Returns physical register used as frame pointer.
   /// This will always returns the frame pointer register, contrary to
   /// getFrameRegister() which returns the "base pointer" in situations
diff --git a/llvm/test/CodeGen/X86/clobber_base_ptr.ll b/llvm/test/CodeGen/X86/clobber_base_ptr.ll
new file mode 100644
index 0000000000000..29894b6d38906
--- /dev/null
+++ b/llvm/test/CodeGen/X86/clobber_base_ptr.ll
@@ -0,0 +1,59 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc < %s | FileCheck %s
+
+target datalayout = "e-m:x-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:32-n8:16:32-a:0:32-S32"
+target triple = "i386-pc-windows-gnu"
+
+; This function uses esi as base pointer, the inline asm clobbers esi, so we
+; should save esi using esp before the inline asm, and restore esi after the
+; inline asm.
+
+define i32 @foo() {
+; CHECK-LABEL: foo:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    pushl %ebp
+; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    .cfi_offset %ebp, -8
+; CHECK-NEXT:    movl %esp, %ebp
+; CHECK-NEXT:    .cfi_def_cfa_register %ebp
+; CHECK-NEXT:    pushl %edi
+; CHECK-NEXT:    pushl %esi
+; CHECK-NEXT:    andl $-16, %esp
+; CHECK-NEXT:    subl $16, %esp
+; CHECK-NEXT:    movl %esp, %esi
+; CHECK-NEXT:    .cfi_offset %esi, -16
+; CHECK-NEXT:    .cfi_offset %edi, -12
+; CHECK-NEXT:    movl $4, 12(%esi)
+; CHECK-NEXT:    movl 12(%esi), %eax
+; CHECK-NEXT:    addl $3, %eax
+; CHECK-NEXT:    andl $-4, %eax
+; CHECK-NEXT:    calll __alloca
+; CHECK-NEXT:    movl %esp, %eax
+; CHECK-NEXT:    andl $-16, %eax
+; CHECK-NEXT:    movl %eax, %esp
+; CHECK-NEXT:    movl $1, (%eax)
+; CHECK-NEXT:    leal 8(%esi), %edi
+; CHECK-NEXT:    movl $4, %ecx
+; CHECK-NEXT:    pushl %esi
+; CHECK-NEXT:    movl %eax, %esi
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    rep movsb (%esi), %es:(%edi)
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    popl %esi
+; CHECK-NEXT:    movl 8(%esi), %eax
+; CHECK-NEXT:    leal -8(%ebp), %esp
+; CHECK-NEXT:    popl %esi
+; CHECK-NEXT:    popl %edi
+; CHECK-NEXT:    popl %ebp
+; CHECK-NEXT:    retl
+entry:
+  %size = alloca i32, align 4
+  %g = alloca i32, align 4
+  store volatile i32 4, ptr %size, align 4
+  %len = load volatile i32, ptr %size, align 4
+  %var_array = alloca i8, i32 %len, align 16
+  store i32 1, ptr %var_array, align 16
+  %nil = call { ptr, ptr, i32 } asm "rep movsb", "={di},={si},={cx},0,1,2,~{memory},~{dirflag},~{fpsr},~{flags}"(ptr %g, ptr %var_array, i32 4)
+  %retval = load i32, ptr %g, align 4
+  ret i32 %retval
+}
diff --git a/llvm/test/CodeGen/X86/i386-baseptr.ll b/llvm/test/CodeGen/X86/i386-baseptr.ll
index 08e4bde7353a4..777eb838b84cc 100644
--- a/llvm/test/CodeGen/X86/i386-baseptr.ll
+++ b/llvm/test/CodeGen/X86/i386-baseptr.ll
@@ -109,10 +109,14 @@ define x86_regcallcc void @clobber_baseptr_argptr(i32 %param1, i32 %param2, i32
 ; CHECK-NEXT:    subl %eax, %edx
 ; CHECK-NEXT:    movl %edx, %esp
 ; CHECK-NEXT:    negl %eax
+; CHECK-NEXT:    pushl %esi
+; CHECK-NEXT:    subl $28, %esp
 ; CHECK-NEXT:    movl $405, %esi # imm = 0x195
 ; CHECK-NEXT:    #APP
 ; CHECK-NEXT:    nop
 ; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    addl $28, %esp
+; CHECK-NEXT:    popl %esi
 ; CHECK-NEXT:    movl $405, %ebx # imm = 0x195
 ; CHECK-NEXT:    #APP
 ; CHECK-NEXT:    nop
diff --git a/llvm/test/CodeGen/X86/x86-64-baseptr.ll b/llvm/test/CodeGen/X86/x86-64-baseptr.ll
index 8cda4ba2814bb..020004def6e7a 100644
--- a/llvm/test/CodeGen/X86/x86-64-baseptr.ll
+++ b/llvm/test/CodeGen/X86/x86-64-baseptr.ll
@@ -136,10 +136,14 @@ define void @clobber_base() #0 {
 ; X32ABI-NEXT:    subl %eax, %edx
 ; X32ABI-NEXT:    negl %eax
 ; X32ABI-NEXT:    movl %edx, %esp
+; X32ABI-NEXT:    pushq %rbx
+; X32ABI-NEXT:    subl $24, %esp
 ; X32ABI-NEXT:    movl $405, %ebx # imm = 0x195
 ; X32ABI-NEXT:    #APP
 ; X32ABI-NEXT:    nop
 ; X32ABI-NEXT:    #NO_APP
+; X32ABI-NEXT:    addl $24, %esp
+; X32ABI-NEXT:    popq %rbx
 ; X32ABI-NEXT:    movl $8, %edx
 ; X32ABI-NEXT:    #APP
 ; X32ABI-NEXT:    movl %edx, (%ebx)
@@ -268,6 +272,8 @@ define x86_regcallcc void @clobber_baseptr_argptr(i32 %param1, i32 %param2, i32
 ; X32ABI-NEXT:    subl %eax, %edx
 ; X32ABI-NEXT:    negl %eax
 ; X32ABI-NEXT:    movl %edx, %esp
+; X32ABI-NEXT:    pushq %rbx
+; X32ABI-NEXT:    subl $24, %esp
 ; X32ABI-NEXT:    movl $405, %ebx # imm = 0x195
 ; X32ABI-NEXT:    #APP
 ; X32ABI-NEXT:    nop
@@ -275,6 +281,8 @@ define x86_regcallcc void @clobber_baseptr_argptr(i32 %param1, i32 %param2, i32
 ; X32ABI-NEXT:    #APP
 ; X32ABI-NEXT:    nop
 ; X32ABI-NEXT:    #NO_APP
+; X32ABI-NEXT:    addl $24, %esp
+; X32ABI-NEXT:    popq %rbx
 ; X32ABI-NEXT:    movl...
[truncated]

Copy link

github-actions bot commented Feb 7, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@aengelke
Copy link
Contributor

aengelke commented Feb 13, 2024

This apparently doesn't catch all cases. Example: ghccc uses rbp as argument register, but rbp is used after the function call despite being clobbered before.

Edit: should've read the code more carefully before commenting, sorry. Why is this hidden behind a default-off option? This feels like we should do this always (or at least fail before generating invalid code?). Plus: at least on x86-64 the CFI info is wrong once the frame pointer is clobbered.

; RUN: llc -mtriple=x86_64-pc-linux -stackrealign -verify-machineinstrs < %s

declare ghccc i32 @external(i32)

define i32 @foo(i32 %0, i32 %1) {
    %x = call ghccc i32 @external(i32 %0, i32 %1)
    ret i32 %x
}

Output (w/o directives for brevity):

foo:                                    # @foo
# %bb.0:
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        andq    $-16, %rsp
        subq    $16, %rsp
                                        # kill: def $edi killed $edi def $rdi
        movl    %esi, %ebp
        movq    %rdi, %r13
        callq   external@PLT
        leaq    -40(%rbp), %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        retq

@weiguozhi
Copy link
Contributor Author

Thanks for the test case. I added it to this patch.

This is exactly what I had in my mind, see the comments for function spillFrameBasePointer. But I failed to find a real test case. So I disabled checking fp by default. It can be enabled with -spill-clobbered-fp=true, and then I got

        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        andq    $-16, %rsp
        subq    $16, %rsp
        pushq   %rbp                   // save rbp
        pushq   %rax
        movl    %esi, %ebp
        movq    %rdi, %r13
        callq   external@PLT
        addq    $8, %rsp
        popq    %rbp                   // restore rbp
        leaq    -40(%rbp), %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        retq

@weiguozhi weiguozhi requested a review from rnk February 21, 2024 03:18
@weiguozhi
Copy link
Contributor Author

ping

@aengelke
Copy link
Contributor

Still, the CFI info looks wrong once the frame pointer is clobbered (updates to rsp and rbp should be accompanied by matching CFI directives) and I have concerns that this will break unwinding (not just asynchronous unwinding but also exceptions thrown from the called function). Could you test that unwinding through such a function call works as expected?

(fwiw, I'm not a maintainer and I'd personally like to see support for this use case, but I have concerns that this will break other things.)

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! This issue is over a decade old.

@@ -963,6 +963,16 @@ class TargetRegisterInfo : public MCRegisterInfo {
return false;
}

/// Returns true if the target needs a base register.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that AArch64, PPC, ARM, and RISCV all use virtual base registers (requiresVirtualBaseRegisters returns true in most cases) instead of physical base registers. Should we look into that in addition to this change? Perhaps we can add an x86-specific cl::opt that uses a virtual base register, try that on some corpus of code, and see if that is a) correct and b) has acceptable performance.

Regardless, we still need support for spilling the FP around FP-clobbers, so I think we need your change, and we should go forward as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply implements requiresVirtualBaseRegisters and returns true for X86 doesn't work, it still generates wrong code for i386-baseptr.ll. It needs more changes in different places.

/// register as base pointer and let RA allocate a physical register to it.
/// 2. If there is no other instructions access stack with fp/bp from the
/// inline asm to the epilog, we can skip the save and restore operations.
void PEI::spillFrameBasePointer(MachineFunction &MF) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the physical base pointer concept is essentially specific to x86. Is there a hook we can use to move this entire pass into X86FrameLowering? That would avoid the need to add as many hooks to TargetFrameLowering and TargetRegisterInfo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also many targets don't implement requiresVirtualBaseRegisters, I don't know if they are also using physical base pointer. The frame pointer part should impact almost all targets.

@weiguozhi
Copy link
Contributor Author

Still, the CFI info looks wrong once the frame pointer is clobbered (updates to rsp and rbp should be accompanied by matching CFI directives) and I have concerns that this will break unwinding (not just asynchronous unwinding but also exceptions thrown from the called function). Could you test that unwinding through such a function call works as expected?

I'm very ignorant in DWARF. After some study, I think we need the following additional CFI instructions

foo:                                    # @foo
        .cfi_startproc
# %bb.0:
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        andq    $-16, %rsp
        subq    $16, %rsp
        .cfi_offset %rbx, -56
        .cfi_offset %r12, -48
        .cfi_offset %r13, -40
        .cfi_offset %r14, -32
        .cfi_offset %r15, -24
                                        # kill: def $edi killed $edi def $rdi
        pushq   %rbp
        pushq   %rax
+        .cfi_def_cfa_expression    [%rsp + 8]   + 16              //  %rbp is stored in memory [%rsp + 8]
        movl    %esi, %ebp
        movq    %rdi, %r13
        callq   external@PLT
        addq    $8, %rsp
        popq    %rbp
+        .cfi_def_cfa   %rbp, 16                                   // %rbp is restored
        leaq    -40(%rbp), %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq

And the cfi_def_cfa_expression should be constructed similar to function X86FrameLowering::emitCalleeSavedFrameMoves. Is my understanding right?

(fwiw, I'm not a maintainer and I'd personally like to see support for this use case, but I have concerns that this will break other things.)

Because the frame pointer is clobbered, so the unwinding information is already broken. The current patch doesn't make it worse.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding myself short on time, so maybe I can hand the review off to @zmodem , who has done various backend changes to switch lowering and conditional tail calls and so on.

The current behavior is pretty bad, so even if there are some edge cases that this solution doesn't handle such as inaccurate unwind info, this will be an improvement, and we should try to move forward.

@weiguozhi
Copy link
Contributor Author

I added unwinding information when FP is saved/restored.

@rnk rnk requested a review from zmodem May 16, 2024 21:56
@rnk
Copy link
Collaborator

rnk commented May 16, 2024

Hans, can you please review this?

Copy link
Collaborator

@zmodem zmodem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay. I'll do my best to review this.

Since this is currently only handling x86, should we keep it in the x86 target for now, and try to generalize later (maybe with some input from an aarch64 owner)?

@@ -599,6 +599,7 @@ class MachineFrameInfo {
/// Return the alignment in bytes that this function must be aligned to,
/// which is greater than the default stack alignment provided by the target.
Align getMaxAlign() const { return MaxAlignment; }
Align getStackAlignment() const { return StackAlignment; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surprising that we didn't need to expose this before now. Do we really need it? And if we do, should it have its own comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can get it from TFL, so it is not needed now.

Comment on lines 470 to 471
/// Issue instructions to allocate stack space and spill FP and/or BP to stack
/// using SP register.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider spelling out the abbreviations (FP, BP, SP).

}

/// Returns the physical base register used to access stack object.
virtual Register getBaseRegister(const MachineFunction &MF) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to assert(hasBaseRegister(MF)) here?

Comment on lines 80 to 88
static cl::opt<bool>
SpillClobberedFP("spill-clobbered-fp",
cl::desc("Spill clobbered fp register to stack."),
cl::init(false), cl::Hidden);

static cl::opt<bool>
SpillClobberedBP("spill-clobbered-bp",
cl::desc("Spill clobbered bp register to stack."),
cl::init(true), cl::Hidden);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we'd need flags for this?

@@ -122,6 +131,7 @@ class PEI : public MachineFunctionPass {
void calculateCallFrameInfo(MachineFunction &MF);
void calculateSaveRestoreBlocks(MachineFunction &MF);
void spillCalleeSavedRegs(MachineFunction &MF);
void spillFrameBasePointer(MachineFunction &MF);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TargetFrameLowering.h you use "FPBP" in the function names, here it's "FrameBase". Let's pick one convention for consistency.

I think "FPBP" makes it more clear that it concerns two registers, and not the "frame base pointer".

TII.get(getPUSHOpcode(MF.getSubtarget<X86Subtarget>())))
.addReg(BP);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function probably expects at least one of SpillFP and SpillBP to be set. Assert that at the top?

A compiler might reasonably warn that RC could be uninitialized here.

// Spill FP.
if (SpillFP) {
Register FP = TRI->getFrameRegister(MF);
if (STI.isTarget64BitILP32())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a test case covering X32?

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
; RUN: llc -mtriple=x86_64-pc-linux -stackrealign -spill-clobbered-fp=true -verify-machineinstrs < %s | FileCheck %s

; Calling convention ghccc uses ebp to pass parameter, so calling a function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have a test which covers the case when both the frame and base pointer need to be spilled/restored?

; CHECK-NEXT: pushq %rbp
; CHECK-NEXT: pushq %rax
; CHECK-NEXT: .cfi_remember_state
; CHECK-NEXT: .cfi_escape 0x0f, 0x06, 0x77, 0x08, 0x06, 0x11, 0x10, 0x22 #
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's any stack manipulation after this (e.g. to pass more arguments to the call), I think the CFI would be incorrect? I'm not sure how we'd handle that though?

BuildMI(*MBB, BeforeMI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);

// Setup new CFA value with DW_CFA_def_cfa_expression:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnk what about Windows unwinding. Do we need to do something for that here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're spilling the base pointer, you shouldn't need to touch the unwind: we normally restore the stack pointer from fp, anyway.

If you're spilling the frame pointer, you'd have to do something complicated to get correct unwind; probably represent it as a separate function (.seh_endproc/.seh_startproc/etc.). You can't specify unwind opcodes outside the prolog.

@weiguozhi
Copy link
Contributor Author

Since this is currently only handling x86, should we keep it in the x86 target for now, and try to generalize later (maybe with some input from an aarch64 owner)?

I will move it to x86 backend for this initial version. Then many of the callbacks can be removed.

This patch fixes llvm#17204.

If a base pointer is used in a function, and it is clobbered by an
instruction (typically an inline asm), current register allocator can't
handle this situation, so BP becomes garbage after those instructions.
It can also occur to FP in theory.

We can spill and reload FP/BP registers around those instructions. But
normal spill/reload instructions also use FP/BP, so we can't spill them
into normal spill slots, instead we spill them into the top of stack by
using SP register.
Add a new test case to demostrate save/restore of FP register.
@weiguozhi weiguozhi reopened this Jul 15, 2024
@weiguozhi
Copy link
Contributor Author

I have moved the implementation to x86 backend. Please review it again.

@david-xl
Copy link
Contributor

david-xl commented Aug 1, 2024

Any more review comments for this patch? The underlying problem is blocking the preserve_none CC from being widely used.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hans is out on summer break, but I went ahead and took a look

@@ -4231,3 +4231,325 @@ void X86FrameLowering::restoreWinEHStackPointersInParent(
/*RestoreSP=*/IsSEH);
}
}

static int computeSPAdjust4SpillFPBP(MachineFunction &MF,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a more descriptive name and add more comments about what this is computing. I came up with something like computeFPBPAlignmentGap, and then would describe the alignment gap as the difference between the current SP and the next properly aligned stack offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

FP = TRI->getFrameRegister(MF);
if (STI.isTarget64BitILP32())
FP = Register(getX86SubSuperRegister(FP, 64));
RC = TRI->getMinimalPhysRegClass(FP);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This register class calculation is the same between spill and restore FPBP.

I suggest you can restructure this to have one method, saveAndRestoreFPBPUsingSP, which takes BeforeMI and AfterMI, calculates this shared state once, and then calls spill/restoreFPBPUsingSP, and maybe just pass in the FP/BP Register variables. They are nullable, so you can check for isValid before pushing or popping.

SPAdjust will also be the same and only needs to be calculated once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -228,6 +228,8 @@ bool PEI::runOnMachineFunction(MachineFunction &MF) {
FrameIndexVirtualScavenging = TRI->requiresFrameIndexScavenging(MF);
ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();

TFI->spillFPBP(MF);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PEI process is very complicated, please add comments elaborating on what this does and why it comes before call frame instruction elimination.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


// Invoke instruction has been lowered to normal function call. We try to figure
// out if MI comes from Invoke.
// Do we have any better method?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like an incomplete solution. If we have an invoke that actually clobbers RBX and we need to spill around it, we ought to reliably fail with an error instead of silently doing nothing as we currently do. That's part of what I really want to get out of this. We can't solve all problems, but the compiler should emit an error if it gets into an impossible situation that it can't handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will do detection work in a separate patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with that.

; CHECK-NEXT: popq %rbp
; CHECK-NEXT: .cfi_def_cfa %rsp, 8
; CHECK-NEXT: retq
%x = call ghccc i32 @external(i32 %0, i32 %1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other calling convention that:

  1. clobbers RBX
  2. accepts stack arguments

I'd like to exercise the test case where:

  • we need to pass stack arguments (num regparm args + 4)
  • we have to spill just FP (RBP, not RBX)

I'm worried that the stores or pushes to set up the stack arguments will be disturbed by the RBP push and pops.

Regardless of whether the problem exists or not, please add a test case that exercises spilling around a function that clobbers FP/BP with stack arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only calling convention that clobbers RBP but no RBX is HiPE. I added a new test case for it. And this patch does correct work.

But the generated code is still wrong because of a different bug.

  %6:gr64 = MOV64rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.1)
  %7:gr64 = MOV64rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.0)
  ADJCALLSTACKDOWN64 16, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
  %8:gr64 = COPY $rsp 
  MOV64mr %8:gr64, 1, $noreg, 8, $noreg, killed %7:gr64 :: (store (s64) into stack + 8) 
  MOV64mr %8:gr64, 1, $noreg, 0, $noreg, killed %6:gr64 :: (store (s64) into stack)
  $r15 = COPY %0:gr64
  $rbp = COPY %1:gr64
  $rsi = COPY %2:gr64
  $rdx = COPY %3:gr64
  $rcx = COPY %4:gr64
  $r8 = COPY %5:gr64
  CALL64pcrel32 target-flags(x86-plt) @external2, <regmask>, ...

x86-cf-opt transforms it to

  ADJCALLSTACKDOWN64 16, 0, 16, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp 
  $r15 = COPY %0:gr64
  $rbp = COPY %1:gr64
  $rsi = COPY %2:gr64
  $rdx = COPY %3:gr64
  $rcx = COPY %4:gr64
  $r8 = COPY %5:gr64
  PUSH64rmm %fixed-stack.0, 1, $noreg, 0, $noreg, implicit-def $rsp, implicit $rsp :: (load (s64) from %fixed-stack.0), (store (s64) into stack + 8)
  PUSH64rmm %fixed-stack.1, 1, $noreg, 0, $noreg, implicit-def $rsp, implicit $rsp :: (load (s64) from %fixed-stack.1), (store (s64) into stack)
  CALL64pcrel32 target-flags(x86-plt) @external2, <regmask>, ...

The access to %fixed-stack.0 needs rbp, but it is already assigned function argument.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's a good find. :) Thanks for adding the tests, we don't need to fix this issue before landing.

Move the duplicated computation of physical register, register class and
stack adjustment to new function saveAndRestoreFPBPUsingSP.
Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's merge it!

; CHECK-NEXT: popq %rbp
; CHECK-NEXT: .cfi_def_cfa %rsp, 8
; CHECK-NEXT: retq
%x = call ghccc i32 @external(i32 %0, i32 %1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's a good find. :) Thanks for adding the tests, we don't need to fix this issue before landing.


// Invoke instruction has been lowered to normal function call. We try to figure
// out if MI comes from Invoke.
// Do we have any better method?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with that.

// the same BB, so it will not impact outgoing CFA.
++RememberState;
if (RememberState != 1)
report_fatal_error(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing code, but I generally try to recommend MCContext::reportError over report_fatal_error. We do have actual non-fatal error handling facilities during CodeGen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

@weiguozhi weiguozhi merged commit 0d471b3 into llvm:main Aug 6, 2024
7 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 6, 2024

LLVM Buildbot has detected a new failure on builder clang-cmake-x86_64-avx512-linux running on avx512-intel64 while building llvm at step 13 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/133/builds/2719

Here is the relevant piece of the build log for the reference:

Step 13 (test-suite) failure: test (failure)
******************** TEST 'test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-built-in-setjmp.test' FAILED ********************

/localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/test/sandbox/build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --redirect-input /dev/null --summary /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/test/sandbox/build/SingleSource/Regression/C/gcc-c-torture/execute/Output/GCC-C-execute-built-in-setjmp.test.time /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/test/sandbox/build/SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-built-in-setjmp

+ /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/test/sandbox/build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --redirect-input /dev/null --summary /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/test/sandbox/build/SingleSource/Regression/C/gcc-c-torture/execute/Output/GCC-C-execute-built-in-setjmp.test.time /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/test/sandbox/build/SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-built-in-setjmp
/localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/test/sandbox/build/tools/timeit-target: error: child terminated by signal 11

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 6, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-ppc64le-linux running on ppc64le-sanitizer while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/1966

Here is the relevant piece of the build log for the reference:

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
PASS: ThreadSanitizer-powerpc64le :: race_on_barrier2.c (2349 of 2483)
PASS: ThreadSanitizer-powerpc64le :: suppressions_race.cpp (2350 of 2483)
PASS: ThreadSanitizer-powerpc64le :: custom_mutex2.cpp (2351 of 2483)
PASS: ThreadSanitizer-powerpc64le :: race_top_suppression1.cpp (2352 of 2483)
PASS: ThreadSanitizer-powerpc64le :: suppress_same_stacks.cpp (2353 of 2483)
PASS: ThreadSanitizer-powerpc64le :: race_on_heap.cpp (2354 of 2483)
PASS: ThreadSanitizer-powerpc64le :: race_on_write.cpp (2355 of 2483)
PASS: SanitizerCommon-msan-powerpc64le-Linux :: print-stack-trace.cpp (2356 of 2483)
PASS: ThreadSanitizer-powerpc64le :: signal_in_mutex_lock.cpp (2357 of 2483)
XFAIL: ThreadSanitizer-powerpc64le :: must_deadlock.cpp (2358 of 2483)
FAIL: ThreadSanitizer-powerpc64le :: signal_block.cpp (2359 of 2483)
******************** TEST 'ThreadSanitizer-powerpc64le :: signal_block.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/./bin/clang  -fsanitize=thread -Wall  -m64 -fno-function-sections   -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_block.cpp.tmp &&  /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_block.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/./bin/clang -fsanitize=thread -Wall -m64 -fno-function-sections -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_block.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_block.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp:59:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: WARNING: ThreadSanitizer:
              ^
<stdin>:2:1: note: found here
WARNING: ThreadSanitizer: signal handler spoils errno (pid=4037441)
^~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        1: ================== 
        2: WARNING: ThreadSanitizer: signal handler spoils errno (pid=4037441) 
not:59     !~~~~~~~~~~~~~~~~~~~~~~~~                                            error: no match expected
        3:  Signal 10 handler invoked at: 
        4:  #0 handler(int) /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp:13 (signal_block.cpp.tmp+0xff1a0) 
        5:  #1 thread(void*) /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp:25:5 (signal_block.cpp.tmp+0xff2f0) 
        6:  
        7: SUMMARY: ThreadSanitizer: signal handler spoils errno /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp:13 in handler(int) 
        8: ================== 
        9: DONE 
       10: ThreadSanitizer: reported 1 warnings 
>>>>>>

--

Step 9 (test compiler-rt debug) failure: test compiler-rt debug (failure)
...
PASS: ThreadSanitizer-powerpc64le :: race_on_barrier2.c (2349 of 2483)
PASS: ThreadSanitizer-powerpc64le :: suppressions_race.cpp (2350 of 2483)
PASS: ThreadSanitizer-powerpc64le :: custom_mutex2.cpp (2351 of 2483)
PASS: ThreadSanitizer-powerpc64le :: race_top_suppression1.cpp (2352 of 2483)
PASS: ThreadSanitizer-powerpc64le :: suppress_same_stacks.cpp (2353 of 2483)
PASS: ThreadSanitizer-powerpc64le :: race_on_heap.cpp (2354 of 2483)
PASS: ThreadSanitizer-powerpc64le :: race_on_write.cpp (2355 of 2483)
PASS: SanitizerCommon-msan-powerpc64le-Linux :: print-stack-trace.cpp (2356 of 2483)
PASS: ThreadSanitizer-powerpc64le :: signal_in_mutex_lock.cpp (2357 of 2483)
XFAIL: ThreadSanitizer-powerpc64le :: must_deadlock.cpp (2358 of 2483)
FAIL: ThreadSanitizer-powerpc64le :: signal_block.cpp (2359 of 2483)
******************** TEST 'ThreadSanitizer-powerpc64le :: signal_block.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/./bin/clang  -fsanitize=thread -Wall  -m64 -fno-function-sections   -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_block.cpp.tmp &&  /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_block.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/./bin/clang -fsanitize=thread -Wall -m64 -fno-function-sections -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_block.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_block.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp:59:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: WARNING: ThreadSanitizer:
              ^
<stdin>:2:1: note: found here
WARNING: ThreadSanitizer: signal handler spoils errno (pid=4037441)
^~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        1: ================== 
        2: WARNING: ThreadSanitizer: signal handler spoils errno (pid=4037441) 
not:59     !~~~~~~~~~~~~~~~~~~~~~~~~                                            error: no match expected
        3:  Signal 10 handler invoked at: 
        4:  #0 handler(int) /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp:13 (signal_block.cpp.tmp+0xff1a0) 
        5:  #1 thread(void*) /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp:25:5 (signal_block.cpp.tmp+0xff2f0) 
        6:  
        7: SUMMARY: ThreadSanitizer: signal handler spoils errno /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_block.cpp:13 in handler(int) 
        8: ================== 
        9: DONE 
       10: ThreadSanitizer: reported 1 warnings 
>>>>>>

--


@aengelke
Copy link
Contributor

aengelke commented Aug 7, 2024

This incurs a substantial compile-time regression at O0. http://llvm-compile-time-tracker.com/compare.php?from=ce2a3d9042c95630f12b790bf201c4daf8941afb&to=0d471b3f64d3116bd57c79d872f7384fff80daa5&stat=instructions:u

Can we come up with a no-overhead solution for this, e.g. check during call lowering whether there's a call that clobbers FP/BP and then selectively look at these calls instead of iterating over the entire IR? (IR iteration is expensive. And >99% of the code doesn't need this.) If this takes longer, we might want to temporarily revert this.

banach-space pushed a commit to banach-space/llvm-project that referenced this pull request Aug 7, 2024
…lvm#81048)

This patch fixes llvm#17204.

If a base pointer is used in a function, and it is clobbered by an
instruction (typically an inline asm), current register allocator can't
handle this situation, so BP becomes garbage after those instructions.
It can also occur to FP in theory.

We can spill and reload FP/BP registers around those instructions. But
normal spill/reload instructions also use FP/BP, so we can't spill them
into normal spill slots, instead we spill them into the top of stack by
using SP register.
@rnk
Copy link
Collaborator

rnk commented Aug 7, 2024

I think what you suggest is feasible. I would try to skip this pass under the following conditions:

  • does not use base pointer (BP, but may use FP)
  • has no inline asm (we already track this somewhere)
  • has no FP-clobbering function calls. This is a new flag we could add to X86MachineFunctionInfo and set in X86ISelLoweringCall.cpp somewhere.

The vast majority of functions do not do both realign the stack in the prologue and adjust SP (dynamic alloca) after the prologue, necessitating a base pointer. -O0 functions do tend to have FPs, so it's worth adding an extra flag to track FP clobbering so we can early-exit in those cases. Note that the non-O0 compile time are not affected, presumably they use fomit-frame-pointer.

@weiguozhi
Copy link
Contributor Author

@aengelke @rnk thanks for the good suggestion. I will implement it in a new patch.

Current implementation already checks if BP/FP is used in the function, and skip the pass if neither is used. So it seems there are still a lot of functions use FP with -O0.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 8, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/3017

Here is the relevant piece of the build log for the reference:

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: TableGen/SubtargetFeatureUniqueNames.td' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: not /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llvm-tblgen -gen-subtarget -I /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/../../include /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td 2>&1 | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td -DFILE=/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td -DFILE=/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td
+ not /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llvm-tblgen -gen-subtarget -I /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/../../include /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td:10:11: error: CHECK: expected string not found in input
// CHECK: [[FILE]]:[[@LINE+2]]:5: error: Feature `NameA` already defined.
          ^
<stdin>:1:1: note: scanning from here
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td:8:5: error: Feature `NameA` already defined.
^
<stdin>:1:1: note: with "FILE" equal to "/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames\\.td"
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td:8:5: error: Feature `NameA` already defined.
^
<stdin>:1:1: note: with "@LINE+2" equal to "12"
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td:8:5: error: Feature `NameA` already defined.
^
<stdin>:1:110: note: possible intended match here
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td:8:5: error: Feature `NameA` already defined.
                                                                                                             ^

Input file: <stdin>
Check file: /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td:8:5: error: Feature `NameA` already defined. 
check:10'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
check:10'1                                                                                                                                                                 with "FILE" equal to "/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames\\.td"
check:10'2                                                                                                                                                                 with "@LINE+2" equal to "12"
check:10'3                                                                                                                  ?                                              possible intended match
            2: def FeatureA : SubtargetFeature<"NameA", "", "", "">; 
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            3:  ^ 
check:10'0     ~~~
            4: /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/TableGen/SubtargetFeatureUniqueNames.td:12:5: note: Previous definition here. 
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            5: def FeatureB : SubtargetFeature<"NameA", "", "", "">; 
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            6:  ^ 
check:10'0     ~~~
>>>>>>

--

...

TIFitis pushed a commit that referenced this pull request Aug 8, 2024
…81048)

This patch fixes #17204.

If a base pointer is used in a function, and it is clobbered by an
instruction (typically an inline asm), current register allocator can't
handle this situation, so BP becomes garbage after those instructions.
It can also occur to FP in theory.

We can spill and reload FP/BP registers around those instructions. But
normal spill/reload instructions also use FP/BP, so we can't spill them
into normal spill slots, instead we spill them into the top of stack by
using SP register.
kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this pull request Aug 15, 2024
…lvm#81048)

This patch fixes llvm#17204.

If a base pointer is used in a function, and it is clobbered by an
instruction (typically an inline asm), current register allocator can't
handle this situation, so BP becomes garbage after those instructions.
It can also occur to FP in theory.

We can spill and reload FP/BP registers around those instructions. But
normal spill/reload instructions also use FP/BP, so we can't spill them
into normal spill slots, instead we spill them into the top of stack by
using SP register.
@bylaws
Copy link
Contributor

bylaws commented Oct 1, 2024

Is this intended to impact attribute(naked)) functions? With msvc x86 this attr causes asm to be emitted exactly as specified and clang diverges here.

@efriedma-quic
Copy link
Collaborator

No, that's a bug; please open an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

X86: stack realignment, dynamic allocas, and inline assembly cause conflict over ebx / esi
9 participants