[AArch64][GlobalISel] Legalize fp128 types as libcalls for G_FCMP #98452

Him188 · 2024-07-11T09:25:55Z

Generate libcall for supported predicates.
Generate unsupported predicates as combinations of supported
predicates.
Vectors are scalarized, however some cases like v3f128_fp128 are still failing, because we failed to legalize G_OR for these types.

GISel now generates the same code as SDAG, however, note the difference
in the one case.

- Generate libcall for supported predicates. - Generate unsupported predicates as combinations of supported predicates.

llvmbot · 2024-07-11T09:26:30Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: Him188 (Him188)

Changes

Generate libcall for supported predicates.
Generate unsupported predicates as combinations of supported predicates.

GISel now generates the same code as SDAG, however, note the difference in the one case.

Patch is 51.43 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98452.diff

6 Files Affected:

(modified) llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h (+3)
(modified) llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp (+151-2)
(modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+3-2)
(modified) llvm/test/CodeGen/AArch64/arm64-ccmp.ll (+46-24)
(added) llvm/test/CodeGen/AArch64/fcmp-fp128.ll (+560)
(modified) llvm/test/CodeGen/AArch64/fcmp.ll (+214-139)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
index 284f434fbb9b0..b8a0d94eeda2e 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
@@ -280,6 +280,9 @@ class LegalizerHelper {
   LegalizeResult createResetStateLibcall(MachineIRBuilder &MIRBuilder,
                                          MachineInstr &MI,
                                          LostDebugLocObserver &LocObserver);
+  LegalizeResult createFCMPLibcall(MachineIRBuilder &MIRBuilder,
+                                   MachineInstr &MI,
+                                   LostDebugLocObserver &LocObserver);
 
   MachineInstrBuilder
   getNeutralElementForVecReduce(unsigned Opcode, MachineIRBuilder &MIRBuilder,
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 86de1f3be9047..04a266daa9855 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -723,8 +723,7 @@ static RTLIB::Libcall getOutlineAtomicLibcall(MachineInstr &MI) {
   if (MemType.isVector())
     return RTLIB::UNKNOWN_LIBCALL;
 
-#define LCALLS(A, B)                                                           \
-  { A##B##_RELAX, A##B##_ACQ, A##B##_REL, A##B##_ACQ_REL }
+#define LCALLS(A, B) {A##B##_RELAX, A##B##_ACQ, A##B##_REL, A##B##_ACQ_REL}
 #define LCALL5(A)                                                              \
   LCALLS(A, 1), LCALLS(A, 2), LCALLS(A, 4), LCALLS(A, 8), LCALLS(A, 16)
   switch (Opc) {
@@ -980,6 +979,150 @@ LegalizerHelper::createSetStateLibcall(MachineIRBuilder &MIRBuilder,
                        LocObserver, nullptr);
 }
 
+/// Returns the corresponding libcall for the given Pred and
+/// the ICMP predicate that should be generated to compare with #0
+/// after the libcall.
+static std::pair<RTLIB::Libcall, CmpInst::Predicate>
+getFCMPLibcallDesc(const CmpInst::Predicate Pred) {
+
+  switch (Pred) {
+  case CmpInst::FCMP_OEQ:
+    return {RTLIB::OEQ_F128, CmpInst::ICMP_EQ};
+  case CmpInst::FCMP_UNE:
+    return {RTLIB::UNE_F128, CmpInst::ICMP_NE};
+  case CmpInst::FCMP_OGE:
+    return {RTLIB::OGE_F128, CmpInst::ICMP_SGE};
+  case CmpInst::FCMP_OLT:
+    return {RTLIB::OLT_F128, CmpInst::ICMP_SLT};
+  case CmpInst::FCMP_OLE:
+    return {RTLIB::OLE_F128, CmpInst::ICMP_SLE};
+  case CmpInst::FCMP_OGT:
+    return {RTLIB::OGT_F128, CmpInst::ICMP_SGT};
+  case CmpInst::FCMP_UNO:
+    return {RTLIB::UO_F128, CmpInst::ICMP_NE};
+  default:
+    return {RTLIB::UNKNOWN_LIBCALL, CmpInst::BAD_ICMP_PREDICATE};
+  }
+}
+
+LegalizerHelper::LegalizeResult
+LegalizerHelper::createFCMPLibcall(MachineIRBuilder &MIRBuilder,
+                                   MachineInstr &MI,
+                                   LostDebugLocObserver &LocObserver) {
+  auto &MF = MIRBuilder.getMF();
+  auto &Ctx = MF.getFunction().getContext();
+
+  LLT OpLLT = MRI.getType(MI.getOperand(2).getReg());
+  if (OpLLT != LLT::scalar(128) ||
+      OpLLT != MRI.getType(MI.getOperand(3).getReg()))
+    return UnableToLegalize;
+
+  Type *OpType = getFloatTypeForLLT(Ctx, OpLLT);
+
+  // Libcall always return i32
+  constexpr LLT I32LLT = LLT::scalar(32);
+  constexpr LLT PredTy = LLT::scalar(1);
+
+  const Register DstReg = MI.getOperand(0).getReg();
+  const Register Op1 = MI.getOperand(2).getReg();
+  const Register Op2 = MI.getOperand(3).getReg();
+  const auto Pred =
+      static_cast<CmpInst::Predicate>(MI.getOperand(1).getPredicate());
+
+  // Generates a libcall followed by ICMP
+  const auto BuildLibcall = [&](const RTLIB::Libcall Libcall,
+                                const CmpInst::Predicate ICmpPred) -> Register {
+    Register Temp = MRI.createGenericVirtualRegister(I32LLT);
+    // Generate libcall, storing result into Temp
+    const auto Status =
+        createLibcall(MIRBuilder, Libcall, {Temp, Type::getInt32Ty(Ctx), 0},
+                      {{Op1, OpType, 0}, {Op2, OpType, 1}}, LocObserver, &MI);
+    if (!Status)
+      return MCRegister::NoRegister;
+
+    // FCMP libcall always returns an i32, we need to compare it with #0 to get
+    // the final result.
+    const Register Res = MRI.createGenericVirtualRegister(PredTy);
+    MIRBuilder.buildICmp(ICmpPred, Res, Temp,
+                         MIRBuilder.buildConstant(I32LLT, 0));
+    return Res;
+  };
+
+  // Simple case if we have a direct mapping from predicate to libcall
+  if (const auto [Libcall, ICmpPred] = getFCMPLibcallDesc(Pred);
+      Libcall != RTLIB::UNKNOWN_LIBCALL &&
+      ICmpPred != CmpInst::BAD_ICMP_PREDICATE) {
+    if (const auto Res = BuildLibcall(Libcall, ICmpPred)) {
+      MIRBuilder.buildCopy(DstReg, Res);
+      return Legalized;
+    }
+    return UnableToLegalize;
+  }
+
+  // No direct mapping found, should be generated as combination of libcalls.
+
+  switch (Pred) {
+  case CmpInst::FCMP_UEQ: {
+    // FCMP_UEQ: unordered or equal
+    // Convert into (FCMP_OEQ || FCMP_UNO).
+
+    const auto [OeqLibcall, OeqPred] = getFCMPLibcallDesc(CmpInst::FCMP_OEQ);
+    const auto Oeq = BuildLibcall(OeqLibcall, OeqPred);
+
+    const auto [UnoLibcall, UnoPred] = getFCMPLibcallDesc(CmpInst::FCMP_UNO);
+    const auto Uno = BuildLibcall(UnoLibcall, UnoPred);
+
+    MIRBuilder.buildCopy(DstReg, MIRBuilder.buildOr(PredTy, Oeq, Uno));
+    break;
+  }
+  case CmpInst::FCMP_ONE: {
+    // FCMP_ONE: ordered and operands are unequal
+    // Convert into (!FCMP_OEQ && !FCMP_UNO).
+
+    // We inverse the predicate instead of generating a NOT
+    // to save one instruciton.
+    // On AArch64 isel can even select two cmp into a single ccmp.
+    const auto [OeqLibcall, OeqPred] = getFCMPLibcallDesc(CmpInst::FCMP_OEQ);
+    const auto NotOeq =
+        BuildLibcall(OeqLibcall, CmpInst::getInversePredicate(OeqPred));
+
+    const auto [UnoLibcall, UnoPred] = getFCMPLibcallDesc(CmpInst::FCMP_UNO);
+    const auto NotUno =
+        BuildLibcall(UnoLibcall, CmpInst::getInversePredicate(UnoPred));
+
+    if (NotOeq && NotUno)
+      MIRBuilder.buildCopy(DstReg, MIRBuilder.buildAnd(PredTy, NotOeq, NotUno));
+    else
+      return UnableToLegalize;
+
+    break;
+  }
+  case CmpInst::FCMP_ULT:
+  case CmpInst::FCMP_UGE:
+  case CmpInst::FCMP_UGT:
+  case CmpInst::FCMP_ULE:
+  case CmpInst::FCMP_ORD: {
+    // Convert into: !(inverse(Pred))
+    // E.g. FCMP_ULT becomes !FCMP_OGE
+    // This is equivalent to the following, but saves some instructions.
+    //   MIRBuilder.buildNot(
+    //       PredTy,
+    //       MIRBuilder.buildFCmp(CmpInst::getInversePredicate(Pred), PredTy,
+    //                            Op1, Op2));
+    const auto [InversedLibcall, InversedPred] =
+        getFCMPLibcallDesc(CmpInst::getInversePredicate(Pred));
+    MIRBuilder.buildCopy(
+        DstReg, BuildLibcall(InversedLibcall,
+                             CmpInst::getInversePredicate(InversedPred)));
+    break;
+  }
+  default:
+    return UnableToLegalize;
+  }
+
+  return Legalized;
+}
+
 // The function is used to legalize operations that set default environment
 // state. In C library a call like `fesetmode(FE_DFL_MODE)` is used for that.
 // On most targets supported in glibc FE_DFL_MODE is defined as
@@ -1120,6 +1263,12 @@ LegalizerHelper::libcall(MachineInstr &MI, LostDebugLocObserver &LocObserver) {
       return Status;
     break;
   }
+  case TargetOpcode::G_FCMP: {
+    LegalizeResult Status = createFCMPLibcall(MIRBuilder, MI, LocObserver);
+    if (Status != Legalized)
+      return Status;
+    break;
+  }
   case TargetOpcode::G_FPTOSI:
   case TargetOpcode::G_FPTOUI: {
     // FIXME: Support other types
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index c6eb4d2b3ec78..511e47354cead 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -560,7 +560,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
       })
       .widenScalarOrEltToNextPow2(1)
       .clampScalar(0, s32, s32)
-      .clampScalarOrElt(1, MinFPScalar, s64)
+      .clampScalarOrElt(1, MinFPScalar, s128)
       .minScalarEltSameAsIf(
           [=](const LegalityQuery &Query) {
             const LLT &Ty = Query.Types[0];
@@ -572,7 +572,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
       .clampNumElements(1, v4s16, v8s16)
       .clampNumElements(1, v2s32, v4s32)
       .clampMaxNumElements(1, s64, 2)
-      .moreElementsToNextPow2(1);
+      .moreElementsToNextPow2(1)
+      .libcallFor({{s32, s128}});
 
   // Extensions
   auto ExtLegalFunc = [=](const LegalityQuery &Query) {
diff --git a/llvm/test/CodeGen/AArch64/arm64-ccmp.ll b/llvm/test/CodeGen/AArch64/arm64-ccmp.ll
index b6702bba1598c..e951c80e30845 100644
--- a/llvm/test/CodeGen/AArch64/arm64-ccmp.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-ccmp.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc < %s -debugify-and-strip-all-safe -mcpu=cyclone -verify-machineinstrs -aarch64-enable-ccmp -aarch64-stress-ccmp | FileCheck %s --check-prefixes=CHECK,SDISEL
-; RUN: llc < %s -debugify-and-strip-all-safe -mcpu=cyclone -verify-machineinstrs -aarch64-enable-ccmp -aarch64-stress-ccmp -global-isel -global-isel-abort=2 | FileCheck %s --check-prefixes=CHECK,GISEL
+; RUN: llc < %s -debugify-and-strip-all-safe -mcpu=cyclone -verify-machineinstrs -aarch64-enable-ccmp -aarch64-stress-ccmp -global-isel -global-isel-abort=1 | FileCheck %s --check-prefixes=CHECK,GISEL
 target triple = "arm64-apple-ios"
 
 define i32 @single_same(i32 %a, i32 %b) nounwind ssp {
@@ -950,29 +950,51 @@ define i32 @half_select_and_olt_one(half %v0, half %v1, half %v2, half %v3, i32
 ; Also verify that we don't try to generate f128 FCCMPs, using RT calls instead.
 
 define i32 @f128_select_and_olt_oge(fp128 %v0, fp128 %v1, fp128 %v2, fp128 %v3, i32 %a, i32 %b) #0 {
-; CHECK-LABEL: f128_select_and_olt_oge:
-; CHECK:       ; %bb.0:
-; CHECK-NEXT:    sub sp, sp, #80
-; CHECK-NEXT:    stp x22, x21, [sp, #32] ; 16-byte Folded Spill
-; CHECK-NEXT:    stp x20, x19, [sp, #48] ; 16-byte Folded Spill
-; CHECK-NEXT:    stp x29, x30, [sp, #64] ; 16-byte Folded Spill
-; CHECK-NEXT:    mov x19, x1
-; CHECK-NEXT:    mov x20, x0
-; CHECK-NEXT:    stp q2, q3, [sp] ; 32-byte Folded Spill
-; CHECK-NEXT:    bl ___lttf2
-; CHECK-NEXT:    cmp w0, #0
-; CHECK-NEXT:    cset w21, lt
-; CHECK-NEXT:    ldp q0, q1, [sp] ; 32-byte Folded Reload
-; CHECK-NEXT:    bl ___getf2
-; CHECK-NEXT:    cmp w0, #0
-; CHECK-NEXT:    cset w8, ge
-; CHECK-NEXT:    tst w8, w21
-; CHECK-NEXT:    csel w0, w20, w19, ne
-; CHECK-NEXT:    ldp x29, x30, [sp, #64] ; 16-byte Folded Reload
-; CHECK-NEXT:    ldp x20, x19, [sp, #48] ; 16-byte Folded Reload
-; CHECK-NEXT:    ldp x22, x21, [sp, #32] ; 16-byte Folded Reload
-; CHECK-NEXT:    add sp, sp, #80
-; CHECK-NEXT:    ret
+; SDISEL-LABEL: f128_select_and_olt_oge:
+; SDISEL:       ; %bb.0:
+; SDISEL-NEXT:    sub sp, sp, #80
+; SDISEL-NEXT:    stp x22, x21, [sp, #32] ; 16-byte Folded Spill
+; SDISEL-NEXT:    stp x20, x19, [sp, #48] ; 16-byte Folded Spill
+; SDISEL-NEXT:    stp x29, x30, [sp, #64] ; 16-byte Folded Spill
+; SDISEL-NEXT:    mov x19, x1
+; SDISEL-NEXT:    mov x20, x0
+; SDISEL-NEXT:    stp q2, q3, [sp] ; 32-byte Folded Spill
+; SDISEL-NEXT:    bl ___lttf2
+; SDISEL-NEXT:    cmp w0, #0
+; SDISEL-NEXT:    cset w21, lt
+; SDISEL-NEXT:    ldp q0, q1, [sp] ; 32-byte Folded Reload
+; SDISEL-NEXT:    bl ___getf2
+; SDISEL-NEXT:    cmp w0, #0
+; SDISEL-NEXT:    cset w8, ge
+; SDISEL-NEXT:    tst w8, w21
+; SDISEL-NEXT:    csel w0, w20, w19, ne
+; SDISEL-NEXT:    ldp x29, x30, [sp, #64] ; 16-byte Folded Reload
+; SDISEL-NEXT:    ldp x20, x19, [sp, #48] ; 16-byte Folded Reload
+; SDISEL-NEXT:    ldp x22, x21, [sp, #32] ; 16-byte Folded Reload
+; SDISEL-NEXT:    add sp, sp, #80
+; SDISEL-NEXT:    ret
+;
+; GISEL-LABEL: f128_select_and_olt_oge:
+; GISEL:       ; %bb.0:
+; GISEL-NEXT:    sub sp, sp, #80
+; GISEL-NEXT:    stp x22, x21, [sp, #32] ; 16-byte Folded Spill
+; GISEL-NEXT:    stp x20, x19, [sp, #48] ; 16-byte Folded Spill
+; GISEL-NEXT:    stp x29, x30, [sp, #64] ; 16-byte Folded Spill
+; GISEL-NEXT:    stp q3, q2, [sp] ; 32-byte Folded Spill
+; GISEL-NEXT:    mov x19, x0
+; GISEL-NEXT:    mov x20, x1
+; GISEL-NEXT:    bl ___lttf2
+; GISEL-NEXT:    mov x21, x0
+; GISEL-NEXT:    ldp q1, q0, [sp] ; 32-byte Folded Reload
+; GISEL-NEXT:    bl ___getf2
+; GISEL-NEXT:    cmp w21, #0
+; GISEL-NEXT:    ccmp w0, #0, #8, lt
+; GISEL-NEXT:    csel w0, w19, w20, ge
+; GISEL-NEXT:    ldp x29, x30, [sp, #64] ; 16-byte Folded Reload
+; GISEL-NEXT:    ldp x20, x19, [sp, #48] ; 16-byte Folded Reload
+; GISEL-NEXT:    ldp x22, x21, [sp, #32] ; 16-byte Folded Reload
+; GISEL-NEXT:    add sp, sp, #80
+; GISEL-NEXT:    ret
   %c0 = fcmp olt fp128 %v0, %v1
   %c1 = fcmp oge fp128 %v2, %v3
   %cr = and i1 %c1, %c0
diff --git a/llvm/test/CodeGen/AArch64/fcmp-fp128.ll b/llvm/test/CodeGen/AArch64/fcmp-fp128.ll
new file mode 100644
index 0000000000000..503cb8c533bab
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/fcmp-fp128.ll
@@ -0,0 +1,560 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK-SD
+; RUN: llc -mtriple=aarch64 -global-isel -global-isel-abort=1 -verify-machineinstrs %s -o - | FileCheck %s --check-prefixes=CHECK-GI
+
+; Checks generated libcalls for fp128 types
+
+define double @oeq(fp128 %a, fp128 %b, double %d, double %e) {
+; CHECK-SD-LABEL: oeq:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    stp d9, d8, [sp, #-32]! // 16-byte Folded Spill
+; CHECK-SD-NEXT:    str x30, [sp, #16] // 8-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    .cfi_offset b8, -24
+; CHECK-SD-NEXT:    .cfi_offset b9, -32
+; CHECK-SD-NEXT:    fmov d8, d3
+; CHECK-SD-NEXT:    fmov d9, d2
+; CHECK-SD-NEXT:    bl __eqtf2
+; CHECK-SD-NEXT:    cmp w0, #0
+; CHECK-SD-NEXT:    ldr x30, [sp, #16] // 8-byte Folded Reload
+; CHECK-SD-NEXT:    fcsel d0, d9, d8, eq
+; CHECK-SD-NEXT:    ldp d9, d8, [sp], #32 // 16-byte Folded Reload
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: oeq:
+; CHECK-GI:       // %bb.0: // %entry
+; CHECK-GI-NEXT:    stp d9, d8, [sp, #-32]! // 16-byte Folded Spill
+; CHECK-GI-NEXT:    str x30, [sp, #16] // 8-byte Folded Spill
+; CHECK-GI-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-GI-NEXT:    .cfi_offset w30, -16
+; CHECK-GI-NEXT:    .cfi_offset b8, -24
+; CHECK-GI-NEXT:    .cfi_offset b9, -32
+; CHECK-GI-NEXT:    fmov d8, d2
+; CHECK-GI-NEXT:    fmov d9, d3
+; CHECK-GI-NEXT:    bl __eqtf2
+; CHECK-GI-NEXT:    cmp w0, #0
+; CHECK-GI-NEXT:    ldr x30, [sp, #16] // 8-byte Folded Reload
+; CHECK-GI-NEXT:    fcsel d0, d8, d9, eq
+; CHECK-GI-NEXT:    ldp d9, d8, [sp], #32 // 16-byte Folded Reload
+; CHECK-GI-NEXT:    ret
+entry:
+  %c = fcmp oeq fp128 %a, %b
+  %s = select i1 %c, double %d, double %e
+  ret double %s
+}
+
+define double @ogt(fp128 %a, fp128 %b, double %d, double %e) {
+; CHECK-SD-LABEL: ogt:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    stp d9, d8, [sp, #-32]! // 16-byte Folded Spill
+; CHECK-SD-NEXT:    str x30, [sp, #16] // 8-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    .cfi_offset b8, -24
+; CHECK-SD-NEXT:    .cfi_offset b9, -32
+; CHECK-SD-NEXT:    fmov d8, d3
+; CHECK-SD-NEXT:    fmov d9, d2
+; CHECK-SD-NEXT:    bl __gttf2
+; CHECK-SD-NEXT:    cmp w0, #0
+; CHECK-SD-NEXT:    ldr x30, [sp, #16] // 8-byte Folded Reload
+; CHECK-SD-NEXT:    fcsel d0, d9, d8, gt
+; CHECK-SD-NEXT:    ldp d9, d8, [sp], #32 // 16-byte Folded Reload
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: ogt:
+; CHECK-GI:       // %bb.0: // %entry
+; CHECK-GI-NEXT:    stp d9, d8, [sp, #-32]! // 16-byte Folded Spill
+; CHECK-GI-NEXT:    str x30, [sp, #16] // 8-byte Folded Spill
+; CHECK-GI-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-GI-NEXT:    .cfi_offset w30, -16
+; CHECK-GI-NEXT:    .cfi_offset b8, -24
+; CHECK-GI-NEXT:    .cfi_offset b9, -32
+; CHECK-GI-NEXT:    fmov d8, d2
+; CHECK-GI-NEXT:    fmov d9, d3
+; CHECK-GI-NEXT:    bl __gttf2
+; CHECK-GI-NEXT:    cmp w0, #0
+; CHECK-GI-NEXT:    ldr x30, [sp, #16] // 8-byte Folded Reload
+; CHECK-GI-NEXT:    fcsel d0, d8, d9, gt
+; CHECK-GI-NEXT:    ldp d9, d8, [sp], #32 // 16-byte Folded Reload
+; CHECK-GI-NEXT:    ret
+entry:
+  %c = fcmp ogt fp128 %a, %b
+  %s = select i1 %c, double %d, double %e
+  ret double %s
+}
+
+define double @olt(fp128 %a, fp128 %b, double %d, double %e) {
+; CHECK-SD-LABEL: olt:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    stp d9, d8, [sp, #-32]! // 16-byte Folded Spill
+; CHECK-SD-NEXT:    str x30, [sp, #16] // 8-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    .cfi_offset b8, -24
+; CHECK-SD-NEXT:    .cfi_offset b9, -32
+; CHECK-SD-NEXT:    fmov d8, d3
+; CHECK-SD-NEXT:    fmov d9, d2
+; CHECK-SD-NEXT:    bl __lttf2
+; CHECK-SD-NEXT:    cmp w0, #0
+; CHECK-SD-NEXT:    ldr x30, [sp, #16] // 8-byte Folded Reload
+; CHECK-SD-NEXT:    fcsel d0, d9, d8, lt
+; CHECK-SD-NEXT:    ldp d9, d8, [sp], #32 // 16-byte Folded Reload
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: olt:
+; CHECK-GI:       // %bb.0: // %entry
+; CHECK-GI-NEXT:    stp d9, d8, [sp, #-32]! // 16-byte Folded Spill
+; CHECK-GI-NEXT:    str x30, [sp, #16] // 8-byte Folded Spill
+; CHECK-GI-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-GI-NEXT:    .cfi_offset w30, -16
+; CHECK-GI-NEXT:    .cfi_offset b8, -24
+; CHECK-GI-NEXT:    .cfi_offset b9, -32
+; CHECK-GI-NEXT:    fmov d8, d2
+; CHECK-GI-NEXT:    fmov d9, d3
+; CHECK-GI-NEXT:    bl __lttf2
+; CHECK-GI-NEXT:    cmp w0, #0
+; CHECK-GI-NEXT:    ldr x30, [sp, #16] // 8-byte Folded Reload
+; CHECK-GI-NEXT:    fcsel d0, d8, d9, lt
+; CHECK-GI-NEXT:    ldp d9, d8, [sp], #32 // 16-byte Folded Reload
+; CHECK-GI-NEXT:    ret
+entry:
+  %c = fcmp olt fp128 %a, %b
+  %s = select i1 %c, double %d, double %e
+  ret double %s
+}
+
+define double @ole(fp128 %a, fp128 %b, double %d, double %e) {
+; CHECK-SD-LABEL: ole:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    stp d9, d8, [sp, #-32]! // 16-byte Folded Spill
+; CHECK-SD-NEXT:    str x30, [sp, #16] // 8-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    .cfi_offset b8, -24
+; CHECK-SD-NEXT:    .cfi_offset b9, -32
+; CHECK-SD-NEXT:    fmov d8, d3
+; CHECK-SD-NEXT:    fmov d9, d2
+; CHECK-SD-NEXT:    bl __letf2
+; CHECK-SD-NEXT:    cmp w0, #0
+; CHECK-SD-NEXT:    ldr x30, [sp, #16] // 8-byte Folded Reload
+; CHECK-SD-NEXT:    fcsel d0, d9, d8, le
+; CHECK-SD-NEXT:    ldp d9, d8, [sp], #32 // 16-byte Folded Reload
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: ole:
+; CHECK-GI:       // %bb.0: // %entry
+; CHECK-GI-NEXT:    stp d9, d8, [sp, #-32]! // 16-byte Folded Spill
+; CHECK-GI-NEXT:    str x30, [sp, #16] // 8-byte Folded Spill
+; CHECK-GI-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-GI-NEXT:    .cfi_offset w30, -16
+; CHECK-GI-NEXT:    .cfi_offset b8, -24
+; CHECK-GI-NEXT:    .cfi_offset b9, -32
+; CHECK-GI-NEXT:    fmov d8, d2
+; CHECK-GI-NEXT:    fmov d9, d3
+; CHECK-GI-NEXT:    bl __letf2
+; CHECK-GI-NEXT:    cmp w0, #0
+; CHECK-GI-NEXT:    ldr x30, [sp, #16] // 8-byte Folded Reload
+; CHECK-GI-NEXT:    fcsel d0, d8, d9, le
+; CHECK-GI-NEXT:    ldp d9, d8, [sp], #32 // 16-byte Folded Reload
+; CHECK-GI-NEXT:    ret
+entry:
+  %c = fcmp ole fp128 %a, %b
+  %s = select i1 %c, double %d, double %e
+  ret double %s
+}
+
+define double @one(fp128 %a, fp128 %b, double %d, double %e) {
+; CHECK-SD-LABEL: one:
+; CHECK-SD:       // %bb.0: // %entry
+; CHECK-SD-NEXT:    sub sp, sp, #64
+; CHECK-SD-NEXT:    stp d9, d8, [sp, #32] // 16-byte Folded Spill
+; CHECK-SD-NEXT:    stp x30, x19, [sp, #48] // 16-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-SD-NEXT:    .cfi_offset w19, -8
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    .cfi_offset b8, -24
+; CHECK-SD-NEXT:    .cfi_offset b9, -32
+; CHECK-SD-NEXT:    fmov d8, d3
+; CHECK-SD-NEXT:    fmov d9, d2
+; CHECK-S...
[truncated]

tschuett · 2024-07-11T10:00:13Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+                                   LostDebugLocObserver &LocObserver) {
+  auto &MF = MIRBuilder.getMF();
+  auto &Ctx = MF.getFunction().getContext();
+


GFCmp *Cmp = cast<GFCmp>(&MI);

to get access to the parameters.

tschuett · 2024-07-11T18:19:04Z

As you are touching the legaliizer, a MIR test would be nice.

arsenm · 2024-07-12T06:29:59Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+        createLibcall(MIRBuilder, Libcall, {Temp, Type::getInt32Ty(Ctx), 0},
+                      {{Op1, OpType, 0}, {Op2, OpType, 1}}, LocObserver, &MI);
+    if (!Status)
+      return MCRegister::NoRegister;


Just use Register()

arsenm · 2024-07-12T06:30:33Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+    const Register Res = MRI.createGenericVirtualRegister(PredTy);
+    MIRBuilder.buildICmp(ICmpPred, Res, Temp,
+                         MIRBuilder.buildConstant(I32LLT, 0));


Suggested change

const Register Res = MRI.createGenericVirtualRegister(PredTy);

MIRBuilder.buildICmp(ICmpPred, Res, Temp,

MIRBuilder.buildConstant(I32LLT, 0));

auto Res = MIRBuilder.buildICmp(ICmpPred, PredTy, Temp,

MIRBuilder.buildConstant(I32LLT, 0));

arsenm · 2024-07-12T06:31:41Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+    if (Status != Legalized)
+      return Status;
+    break;


Directly handle everything and return here

arsenm · 2024-07-12T06:33:04Z

As you are touching the legaliizer, a MIR test would be nice.

For the most part I think we should stop doing that unless there's a really special edge case involved that would be hard to reproduce. They're harder to maintain and can miss the system not working as a whole

tschuett · 2024-07-12T07:02:19Z

As you are touching the legaliizer, a MIR test would be nice.

For the most part I think we should stop doing that unless there's a really special edge case involved that would be hard to reproduce. They're harder to maintain and can miss the system not working as a whole

This PR is about legalization and not stressing the system. There is quite a bit of code added to the legalizer that is not well tested. That is why I asked for a MIR file.

Him188 · 2024-07-12T14:19:53Z

There is quite a bit of code added to the legalizer that is not well tested. That is why I asked for a MIR file.

@tschuett The llvm/test/CodeGen/AArch64/fcmp-fp128.ll covers all the conditions. Isn't this adequate?

Him188 · 2024-07-16T11:01:33Z

Ping @davemgreen and @aemerson ?

davemgreen · 2024-07-16T11:27:27Z

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

@@ -560,7 +560,7 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
      })
      .widenScalarOrEltToNextPow2(1)
      .clampScalar(0, s32, s32)
-      .clampScalarOrElt(1, MinFPScalar, s64)
+      .clampScalarOrElt(1, MinFPScalar, s128)


Is this needed? If so could it use minScalarOrElt?

Is it worth scalarizing vectors in this same patch?

Is this needed? If so could it use minScalarOrElt?

Replaced with minScalarOrElt.

Is it worth scalarizing vectors in this same patch?

Yes, added .scalarizeIf(scalarOrEltWiderThan(1, 64), 1).

v3f128_fp128 is still failing because we failed to legalize %21:_(<3 x s128>) = G_OR %29:_, %30:_

davemgreen · 2024-07-19T05:49:09Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+  const auto BuildLibcall = [&](const RTLIB::Libcall Libcall,
+                                const CmpInst::Predicate ICmpPred) -> Register {
+    Register Temp = MRI.createGenericVirtualRegister(I32LLT);
+    // Generate libcall, storing result into Temp


Maybe "storing result into Temp" -> "holding result in Temp"

davemgreen

Thanks this LGTM, I just had a couple more comments about the details.

davemgreen · 2024-07-19T05:58:01Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+    // Convert into (!FCMP_OEQ && !FCMP_UNO).
+
+    // We inverse the predicate instead of generating a NOT
+    // to save one instruciton.


instruciton->instruction

davemgreen · 2024-07-19T05:59:30Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+    const auto [UnoLibcall, UnoPred] = getFCMPLibcallDesc(CmpInst::FCMP_UNO);
+    const auto Uno = BuildLibcall(UnoLibcall, UnoPred);
+
+    MIRBuilder.buildCopy(DstReg, MIRBuilder.buildOr(PredTy, Oeq, Uno));


Is there a problem with generating MIRBuilder.buildOr(DstReg, Oeq, Uno) without the copy?

There was, because BuildLibcall returned s1, however DstReg is s32.

I changed buildICmp to define a s32, so BuildLibcall returns s32 and we don't need copy nor ext/trunc.

davemgreen · 2024-07-19T06:03:43Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+      Libcall != RTLIB::UNKNOWN_LIBCALL &&
+      ICmpPred != CmpInst::BAD_ICMP_PREDICATE) {
+    if (const auto Res = BuildLibcall(Libcall, ICmpPred)) {
+      MIRBuilder.buildCopy(DstReg, Res);


If the type of the function is always an i32, might that need to be zext/sext/trunc to the predicate type?

davemgreen · 2024-07-19T06:04:34Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+        BuildLibcall(UnoLibcall, CmpInst::getInversePredicate(UnoPred));
+
+    if (NotOeq && NotUno)
+      MIRBuilder.buildCopy(DstReg, MIRBuilder.buildAnd(PredTy, NotOeq, NotUno));


Some of this is inconsistent about whether BuildLibcall can fail. The ones below use the output unconditionally.

Fixed. Now all the branches check for failures.

davemgreen · 2024-07-19T06:06:12Z

llvm/test/CodeGen/AArch64/arm64-ccmp.ll

@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc < %s -debugify-and-strip-all-safe -mcpu=cyclone -verify-machineinstrs -aarch64-enable-ccmp -aarch64-stress-ccmp | FileCheck %s --check-prefixes=CHECK,SDISEL
-; RUN: llc < %s -debugify-and-strip-all-safe -mcpu=cyclone -verify-machineinstrs -aarch64-enable-ccmp -aarch64-stress-ccmp -global-isel -global-isel-abort=2 | FileCheck %s --check-prefixes=CHECK,GISEL
+; RUN: llc < %s -debugify-and-strip-all-safe -mcpu=cyclone -verify-machineinstrs -aarch64-enable-ccmp -aarch64-stress-ccmp -global-isel -global-isel-abort=1 | FileCheck %s --check-prefixes=CHECK,GISEL


I think -global-isel-abort=1 is the default if -global-isel is specified.

Removed -global-isel-abort=1

davemgreen

Thanks. LGTM

…8452) Summary: - Generate libcall for supported predicates. - Generate unsupported predicates as combinations of supported predicates. - Vectors are scalarized, however some cases like `v3f128_fp128` are still failing, because we failed to legalize G_OR for these types. GISel now generates the same code as SDAG, however, note the difference in the `one` case. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250518

[AArch64] Legalize fp128 scalars for G_FCMP

a294242

- Generate libcall for supported predicates. - Generate unsupported predicates as combinations of supported predicates.

Him188 requested review from aemerson, arsenm, madhur13490, tschuett and davemgreen July 11, 2024 09:25

llvmbot added backend:AArch64 llvm:globalisel labels Jul 11, 2024

tschuett reviewed Jul 11, 2024

View reviewed changes

arsenm reviewed Jul 12, 2024

View reviewed changes

Apply suggestions

9c702f6

davemgreen reviewed Jul 16, 2024

View reviewed changes

Him188 added 2 commits July 17, 2024 17:42

Use minScalarOrElt

30bf051

Scalarize vectors

50c5854

davemgreen reviewed Jul 19, 2024

View reviewed changes

Him188 added 4 commits July 22, 2024 11:02

Fix typos

58a1bb9

Remove redundant -global-isel-abort=1

f328a58

Check BuildLibcall failures

ce436c7

Use s32 type for ICMP def and update comments

da0b110

Him188 changed the title ~~[AArch64] Legalize fp128 scalars as libcalls for G_FCMP~~ [AArch64] Legalize fp128 types as libcalls for G_FCMP Jul 22, 2024

Him188 requested a review from davemgreen July 22, 2024 15:12

davemgreen approved these changes Jul 23, 2024

View reviewed changes

Him188 changed the title ~~[AArch64] Legalize fp128 types as libcalls for G_FCMP~~ [AArch64][GISel] Legalize fp128 types as libcalls for G_FCMP Jul 25, 2024

Him188 changed the title ~~[AArch64][GISel] Legalize fp128 types as libcalls for G_FCMP~~ [AArch64][GlobalISel] Legalize fp128 types as libcalls for G_FCMP Jul 25, 2024

Him188 merged commit ba461f8 into llvm:main Jul 25, 2024
7 checks passed

Him188 deleted the tguan/gisel-fcmp-fp128 branch July 25, 2024 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][GlobalISel] Legalize fp128 types as libcalls for G_FCMP #98452

[AArch64][GlobalISel] Legalize fp128 types as libcalls for G_FCMP #98452

Him188 commented Jul 11, 2024 •

edited

Loading

llvmbot commented Jul 11, 2024 •

edited

Loading

tschuett Jul 11, 2024 •

edited

Loading

tschuett commented Jul 11, 2024

arsenm Jul 12, 2024

arsenm Jul 12, 2024

arsenm Jul 12, 2024

arsenm commented Jul 12, 2024

tschuett commented Jul 12, 2024

Him188 commented Jul 12, 2024 •

edited

Loading

Him188 commented Jul 16, 2024

davemgreen Jul 16, 2024

Him188 Jul 18, 2024

davemgreen Jul 19, 2024

davemgreen left a comment

davemgreen Jul 19, 2024

Him188 Jul 22, 2024

davemgreen Jul 19, 2024

Him188 Jul 22, 2024

davemgreen Jul 19, 2024

davemgreen Jul 19, 2024

Him188 Jul 22, 2024

davemgreen Jul 19, 2024

Him188 Jul 22, 2024

davemgreen left a comment

[AArch64][GlobalISel] Legalize fp128 types as libcalls for G_FCMP #98452

[AArch64][GlobalISel] Legalize fp128 types as libcalls for G_FCMP #98452

Conversation

Him188 commented Jul 11, 2024 • edited Loading

llvmbot commented Jul 11, 2024 • edited Loading

tschuett Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

tschuett commented Jul 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arsenm commented Jul 12, 2024

tschuett commented Jul 12, 2024

Him188 commented Jul 12, 2024 • edited Loading

Him188 commented Jul 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

Him188 commented Jul 11, 2024 •

edited

Loading

llvmbot commented Jul 11, 2024 •

edited

Loading

tschuett Jul 11, 2024 •

edited

Loading

Him188 commented Jul 12, 2024 •

edited

Loading