[LV][EVL] Support in-loop reduction using tail folding with EVL. #90184

Mel-Chen · 2024-04-26T09:21:13Z

Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction.

Address one of TODOs from #76172.

llvmbot · 2024-04-26T09:21:46Z

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: Mel Chen (Mel-Chen)

Changes

Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction.
It's worth noting that, since vector predication intrinsics do not yet support fminimum and fmaximum reduction, fminimum and fmaximum reductions are temporarily blocked.

Address one of TODOs from #76172.

Patch is 148.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90184.diff

12 Files Affected:

(modified) llvm/include/llvm/IR/IRBuilder.h (+19)
(modified) llvm/include/llvm/Transforms/Utils/LoopUtils.h (+6)
(modified) llvm/lib/IR/IRBuilder.cpp (+122)
(modified) llvm/lib/Transforms/Utils/LoopUtils.cpp (+56)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+12-2)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+64)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+53)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+23-19)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+137-66)
(added) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll (+112)
(added) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll (+1495)

diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index b6534a1962a2f5..4db1fe5ff93aef 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -746,49 +746,68 @@ class IRBuilderBase {
 private:
   CallInst *getReductionIntrinsic(Intrinsic::ID ID, Value *Src);
 
+  // Helper function for creating VP reduce intrinsic call.
+  CallInst *getReductionIntrinsic(Intrinsic::ID ID, Value *Acc, Value *Src,
+                                  Value *Mask, Value *EVL);
+
 public:
   /// Create a sequential vector fadd reduction intrinsic of the source vector.
   /// The first parameter is a scalar accumulator value. An unordered reduction
   /// can be created by adding the reassoc fast-math flag to the resulting
   /// sequential reduction.
   CallInst *CreateFAddReduce(Value *Acc, Value *Src);
+  CallInst *CreateFAddReduce(Value *Acc, Value *Src, Value *EVL,
+                             Value *Mask = nullptr);
 
   /// Create a sequential vector fmul reduction intrinsic of the source vector.
   /// The first parameter is a scalar accumulator value. An unordered reduction
   /// can be created by adding the reassoc fast-math flag to the resulting
   /// sequential reduction.
   CallInst *CreateFMulReduce(Value *Acc, Value *Src);
+  CallInst *CreateFMulReduce(Value *Acc, Value *Src, Value *EVL,
+                             Value *Mask = nullptr);
 
   /// Create a vector int add reduction intrinsic of the source vector.
   CallInst *CreateAddReduce(Value *Src);
+  CallInst *CreateAddReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector int mul reduction intrinsic of the source vector.
   CallInst *CreateMulReduce(Value *Src);
+  CallInst *CreateMulReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector int AND reduction intrinsic of the source vector.
   CallInst *CreateAndReduce(Value *Src);
+  CallInst *CreateAndReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector int OR reduction intrinsic of the source vector.
   CallInst *CreateOrReduce(Value *Src);
+  CallInst *CreateOrReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector int XOR reduction intrinsic of the source vector.
   CallInst *CreateXorReduce(Value *Src);
+  CallInst *CreateXorReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector integer max reduction intrinsic of the source
   /// vector.
   CallInst *CreateIntMaxReduce(Value *Src, bool IsSigned = false);
+  CallInst *CreateIntMaxReduce(Value *Src, Value *EVL, bool IsSigned = false,
+                               Value *Mask = nullptr);
 
   /// Create a vector integer min reduction intrinsic of the source
   /// vector.
   CallInst *CreateIntMinReduce(Value *Src, bool IsSigned = false);
+  CallInst *CreateIntMinReduce(Value *Src, Value *EVL, bool IsSigned = false,
+                               Value *Mask = nullptr);
 
   /// Create a vector float max reduction intrinsic of the source
   /// vector.
   CallInst *CreateFPMaxReduce(Value *Src);
+  CallInst *CreateFPMaxReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector float min reduction intrinsic of the source
   /// vector.
   CallInst *CreateFPMinReduce(Value *Src);
+  CallInst *CreateFPMinReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector float maximum reduction intrinsic of the source
   /// vector. This variant follows the NaN and signed zero semantic of
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 187ace3a0cbedf..5003fa66100b46 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -403,6 +403,9 @@ Value *getShuffleReduction(IRBuilderBase &Builder, Value *Src, unsigned Op,
 /// Fast-math-flags are propagated using the IRBuilder's setting.
 Value *createSimpleTargetReduction(IRBuilderBase &B, Value *Src,
                                    RecurKind RdxKind);
+Value *createSimpleTargetReduction(IRBuilderBase &B, Value *Src,
+                                   RecurKind RdxKind, Value *EVL,
+                                   Value *Mask = nullptr);
 
 /// Create a target reduction of the given vector \p Src for a reduction of the
 /// kind RecurKind::IAnyOf or RecurKind::FAnyOf. The reduction operation is
@@ -423,6 +426,9 @@ Value *createTargetReduction(IRBuilderBase &B, const RecurrenceDescriptor &Desc,
 Value *createOrderedReduction(IRBuilderBase &B,
                               const RecurrenceDescriptor &Desc, Value *Src,
                               Value *Start);
+Value *createOrderedReduction(IRBuilderBase &B,
+                              const RecurrenceDescriptor &Desc, Value *Src,
+                              Value *Start, Value *EVL, Value *Mask = nullptr);
 
 /// Get the intersection (logical and) of all of the potential IR flags
 /// of each scalar operation (VL) that will be converted into a vector (I).
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index d6746d1d438242..90f637940d00da 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -414,6 +414,20 @@ CallInst *IRBuilderBase::getReductionIntrinsic(Intrinsic::ID ID, Value *Src) {
   return CreateCall(Decl, Ops);
 }
 
+CallInst *IRBuilderBase::getReductionIntrinsic(Intrinsic::ID ID, Value *Acc,
+                                               Value *Src, Value *Mask,
+                                               Value *EVL) {
+  Module *M = GetInsertBlock()->getParent()->getParent();
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  EVL = CreateIntCast(EVL, getInt32Ty(), /*isSigned=*/false);
+  if (!Mask)
+    Mask = CreateVectorSplat(SrcTy->getElementCount(), getTrue());
+  Value *Ops[] = {Acc, Src, Mask, EVL};
+  Type *Tys[] = {SrcTy};
+  auto Decl = Intrinsic::getDeclaration(M, ID, Tys);
+  return CreateCall(Decl, Ops);
+}
+
 CallInst *IRBuilderBase::CreateFAddReduce(Value *Acc, Value *Src) {
   Module *M = GetInsertBlock()->getParent()->getParent();
   Value *Ops[] = {Acc, Src};
@@ -422,6 +436,11 @@ CallInst *IRBuilderBase::CreateFAddReduce(Value *Acc, Value *Src) {
   return CreateCall(Decl, Ops);
 }
 
+CallInst *IRBuilderBase::CreateFAddReduce(Value *Acc, Value *Src, Value *EVL,
+                                          Value *Mask) {
+  return getReductionIntrinsic(Intrinsic::vp_reduce_fadd, Acc, Src, Mask ,EVL);
+}
+
 CallInst *IRBuilderBase::CreateFMulReduce(Value *Acc, Value *Src) {
   Module *M = GetInsertBlock()->getParent()->getParent();
   Value *Ops[] = {Acc, Src};
@@ -430,46 +449,149 @@ CallInst *IRBuilderBase::CreateFMulReduce(Value *Acc, Value *Src) {
   return CreateCall(Decl, Ops);
 }
 
+CallInst *IRBuilderBase::CreateFMulReduce(Value *Acc, Value *Src, Value *EVL,
+                                          Value *Mask) {
+  return getReductionIntrinsic(Intrinsic::vp_reduce_fmul, Acc, Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateAddReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_add, Src);
 }
 
+CallInst *IRBuilderBase::CreateAddReduce(Value *Src, Value *EVL, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(Intrinsic::vp_reduce_add,
+                               ConstantInt::get(EltTy, 0), Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateMulReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_mul, Src);
 }
 
+CallInst *IRBuilderBase::CreateMulReduce(Value *Src, Value *EVL, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(Intrinsic::vp_reduce_mul,
+                               ConstantInt::get(EltTy, 1), Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateAndReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_and, Src);
 }
 
+CallInst *IRBuilderBase::CreateAndReduce(Value *Src, Value *EVL, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(Intrinsic::vp_reduce_and,
+                               Constant::getAllOnesValue(EltTy), Src, Mask,
+                               EVL);
+}
+
 CallInst *IRBuilderBase::CreateOrReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_or, Src);
 }
 
+CallInst *IRBuilderBase::CreateOrReduce(Value *Src, Value *EVL, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(Intrinsic::vp_reduce_or,
+                               ConstantInt::get(EltTy, 0), Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateXorReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_xor, Src);
 }
 
+CallInst *IRBuilderBase::CreateXorReduce(Value *Src, Value *EVL, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(Intrinsic::vp_reduce_xor,
+                               ConstantInt::get(EltTy, 0), Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateIntMaxReduce(Value *Src, bool IsSigned) {
   auto ID =
       IsSigned ? Intrinsic::vector_reduce_smax : Intrinsic::vector_reduce_umax;
   return getReductionIntrinsic(ID, Src);
 }
 
+CallInst *IRBuilderBase::CreateIntMaxReduce(Value *Src, Value *EVL,
+                                            bool IsSigned, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(
+      IsSigned ? Intrinsic::vp_reduce_smax : Intrinsic::vp_reduce_umax,
+      IsSigned ? ConstantInt::get(EltTy, APInt::getSignedMinValue(
+                                             EltTy->getIntegerBitWidth()))
+               : ConstantInt::get(EltTy, 0),
+      Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateIntMinReduce(Value *Src, bool IsSigned) {
   auto ID =
       IsSigned ? Intrinsic::vector_reduce_smin : Intrinsic::vector_reduce_umin;
   return getReductionIntrinsic(ID, Src);
 }
 
+CallInst *IRBuilderBase::CreateIntMinReduce(Value *Src, Value *EVL,
+                                            bool IsSigned, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(
+      IsSigned ? Intrinsic::vp_reduce_smin : Intrinsic::vp_reduce_umin,
+      IsSigned ? ConstantInt::get(EltTy, APInt::getSignedMaxValue(
+                                             EltTy->getIntegerBitWidth()))
+               : Constant::getAllOnesValue(EltTy),
+      Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateFPMaxReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_fmax, Src);
 }
 
+CallInst *IRBuilderBase::CreateFPMaxReduce(Value *Src, Value *EVL,
+                                           Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  FastMathFlags FMF = getFastMathFlags();
+  Value *Neutral;
+  if (FMF.noNaNs())
+    Neutral = FMF.noInfs()
+                  ? ConstantFP::get(
+                        EltTy, APFloat::getLargest(EltTy->getFltSemantics(),
+                                                   /*Negative=*/true))
+                  : ConstantFP::getInfinity(EltTy, true);
+  else
+    Neutral = ConstantFP::getQNaN(EltTy, /*Negative=*/true);
+
+  return getReductionIntrinsic(Intrinsic::vp_reduce_fmax, Neutral, Src, Mask,
+                               EVL);
+}
+
 CallInst *IRBuilderBase::CreateFPMinReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_fmin, Src);
 }
 
+CallInst *IRBuilderBase::CreateFPMinReduce(Value *Src, Value *EVL,
+                                           Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  FastMathFlags FMF = getFastMathFlags();
+  Value *Neutral;
+  if (FMF.noNaNs())
+    Neutral = FMF.noInfs()
+                  ? ConstantFP::get(
+                        EltTy, APFloat::getLargest(EltTy->getFltSemantics(),
+                                                   /*Negative=*/false))
+                  : ConstantFP::getInfinity(EltTy, false);
+  else
+    Neutral = ConstantFP::getQNaN(EltTy, /*Negative=*/false);
+
+  return getReductionIntrinsic(Intrinsic::vp_reduce_fmin, Neutral, Src, Mask,
+                               EVL);
+}
+
 CallInst *IRBuilderBase::CreateFPMaximumReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_fmaximum, Src);
 }
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 73c5d636782294..d0abcdfb1440ab 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -1204,6 +1204,48 @@ Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
   }
 }
 
+Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
+                                         RecurKind RdxKind, Value *EVL,
+                                         Value *Mask) {
+  auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();
+  switch (RdxKind) {
+  case RecurKind::Add:
+    return Builder.CreateAddReduce(Src, EVL, Mask);
+  case RecurKind::Mul:
+    return Builder.CreateMulReduce(Src, EVL, Mask);
+  case RecurKind::And:
+    return Builder.CreateAndReduce(Src, EVL, Mask);
+  case RecurKind::Or:
+    return Builder.CreateOrReduce(Src, EVL, Mask);
+  case RecurKind::Xor:
+    return Builder.CreateXorReduce(Src, EVL, Mask);
+  case RecurKind::FMulAdd:
+  case RecurKind::FAdd:
+    return Builder.CreateFAddReduce(ConstantFP::getNegativeZero(SrcVecEltTy),
+                                    Src, EVL, Mask);
+  case RecurKind::FMul:
+    return Builder.CreateFMulReduce(ConstantFP::get(SrcVecEltTy, 1.0), Src, EVL,
+                                    Mask);
+  case RecurKind::SMax:
+    return Builder.CreateIntMaxReduce(Src, EVL, true, Mask);
+  case RecurKind::SMin:
+    return Builder.CreateIntMinReduce(Src, EVL, true, Mask);
+  case RecurKind::UMax:
+    return Builder.CreateIntMaxReduce(Src, EVL, false, Mask);
+  case RecurKind::UMin:
+    return Builder.CreateIntMinReduce(Src, EVL, false, Mask);
+  case RecurKind::FMax:
+    return Builder.CreateFPMaxReduce(Src, EVL, Mask);
+  case RecurKind::FMin:
+    return Builder.CreateFPMinReduce(Src, EVL, Mask);
+  case RecurKind::FMinimum:
+  case RecurKind::FMaximum:
+    assert(0 && "FMaximum/FMinimum reduction VP intrinsic is not supported.");
+  default:
+    llvm_unreachable("Unhandled opcode");
+  }
+}
+
 Value *llvm::createTargetReduction(IRBuilderBase &B,
                                    const RecurrenceDescriptor &Desc, Value *Src,
                                    PHINode *OrigPhi) {
@@ -1232,6 +1274,20 @@ Value *llvm::createOrderedReduction(IRBuilderBase &B,
   return B.CreateFAddReduce(Start, Src);
 }
 
+Value *llvm::createOrderedReduction(IRBuilderBase &B,
+                                    const RecurrenceDescriptor &Desc,
+                                    Value *Src, Value *Start, Value *EVL,
+                                    Value *Mask) {
+  assert((Desc.getRecurrenceKind() == RecurKind::FAdd ||
+          Desc.getRecurrenceKind() == RecurKind::FMulAdd) &&
+         "Unexpected reduction kind");
+  assert(Src->getType()->isVectorTy() && "Expected a vector type");
+  assert(!Start->getType()->isVectorTy() && "Expected a scalar type");
+  assert(EVL->getType()->isIntegerTy() && "Expected a integer type");
+
+  return B.CreateFAddReduce(Start, Src, EVL, Mask);
+}
+
 void llvm::propagateIRFlags(Value *I, ArrayRef<Value *> VL, Value *OpValue,
                             bool IncludeWrapFlags) {
   auto *VecOp = dyn_cast<Instruction>(I);
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 33c4decd58a6c2..1db531e170a4bf 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1526,6 +1526,17 @@ class LoopVectorizationCostModel {
                                             ForceTailFoldingStyle.getValue());
     if (ForceTailFoldingStyle != TailFoldingStyle::DataWithEVL)
       return;
+
+    // Block folding with EVL since vector-predication intrinsics have not
+    // support FMinimum and FMaximum reduction.
+    // FIXME: remove this check once llvm.vp.reduce.fminimum/fmaximum are
+    // supported
+    bool ContainsFMinimumOrFMaximumReduction =
+        any_of(Legal->getReductionVars(), [&](auto &Reduction) {
+          const RecurrenceDescriptor &RdxDesc = Reduction.second;
+          RecurKind Kind = RdxDesc.getRecurrenceKind();
+          return Kind == RecurKind::FMinimum || Kind == RecurKind::FMaximum;
+        });
     // Override forced styles if needed.
     // FIXME: use actual opcode/data type for analysis here.
     // FIXME: Investigate opportunity for fixed vector factor.
@@ -1535,8 +1546,7 @@ class LoopVectorizationCostModel {
         !EnableVPlanNativePath &&
         // FIXME: implement support for max safe dependency distance.
         Legal->isSafeForAnyVectorWidth() &&
-        // FIXME: remove this once reductions are supported.
-        Legal->getReductionVars().empty();
+        !ContainsFMinimumOrFMaximumReduction;
     if (!EVLIsLegal) {
       // If for some reason EVL mode is unsupported, fallback to
       // DataWithoutLaneMask to try to vectorize the loop with folded tail
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index c74329a0bcc4ac..a444064dab692a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -843,6 +843,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
     case VPRecipeBase::VPDerivedIVSC:
     case VPRecipeBase::VPExpandSCEVSC:
     case VPRecipeBase::VPInstructionSC:
+    case VPRecipeBase::VPReductionEVLSC:
     case VPRecipeBase::VPReductionSC:
     case VPRecipeBase::VPReplicateSC:
     case VPRecipeBase::VPScalarIVStepsSC:
@@ -2110,6 +2111,12 @@ class VPReductionRecipe : public VPSingleDefRecipe {
              VPSlotTracker &SlotTracker) const override;
 #endif
 
+  /// Return the recurrence decriptor for the in-loop reduction.
+  const RecurrenceDescriptor &getRecurrenceDescriptor() const {
+    return RdxDesc;
+  }
+  /// Return true if the in-loop reduction is ordered.
+  bool isOrdered() const { return IsOrdered; };
   /// The VPValue of the scalar Chain being accumulated.
   VPValue *getChainOp() const { return getOperand(0); }
   /// The VPValue of the vector value to be reduced.
@@ -2120,6 +2127,63 @@ class VPReductionRecipe : public VPSingleDefRecipe {
   }
 };
 
+/// A recipe to represent inloop reduction operations with vector-predication
+/// intrinsics, performing a reduction on a vector operand with the explicit
+/// vector length (EVL) into a scalar value, and adding the result to a chain.
+/// The Operands are {ChainOp, VecOp, EVL, [Condition]}.
+class VPReductionEVLRecipe : public VPSingleDefRecipe {
+  /// The recurrence decriptor for the reduction in question.
+  const RecurrenceDescriptor &RdxDesc;
+  bool IsOrdered;
+
+public:
+  VPReductionEVLRecipe(VPReductionRecipe *R, VPValue *EVL)
+      : VPSingleDefRecipe(
+            VPDef::VPReductionEVLSC,
+            ArrayRef<VPValue *>({R->getChainOp(), R->getVecOp(), EVL}),
+            R->getUnderlyingInstr()),
+        RdxDesc(R->getRecurrenceDescriptor()), IsOrdered(R->isOrdered()) {
+    VPValue *CondOp = R->getCondOp();
+    if (CondOp)
+      addOperand(CondOp);
+  };
+
+  ~VPReductionEVLRecipe() override = default;
+
+  VPReductionEVLRecipe *clone() override {
+    llvm_unreachable("cloning not implemented yet");
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPReductionEVLSC)
+
+  /// Generate the reduction in the loop
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+
+  /// The VPValue of the scalar Chain being accumulated.
+  VPValue *getChainOp() const { return g...
[truncated]

github-actions · 2024-04-26T09:24:03Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/IR/IRBuilder.cpp

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/lib/IR/VectorBuilder.cpp

llvm/include/llvm/IR/VectorBuilder.h

fhahn

The title mentions this adds support for in-loop reductions, but I wasn't able to find a check to make sure we only vectorize in-loop reductions?

All tests seem to pass flags guiding towards the use of in-loop/ordered reductions, so the case where the regular reduction strategy is chosen may not be tested well

llvm/lib/IR/VectorBuilder.cpp

llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll

fhahn · 2024-05-03T13:33:11Z

llvm/include/llvm/IR/VectorBuilder.h

@@ -57,6 +58,11 @@ class VectorBuilder {
    return RetType();
  }

+  // Helper function for creating VP intrinsic call.


independent of this change, but if VectorBuilder only supports generation of vector-predication intrinsics, then it would be better to call it VectorPredicationBuilder to avoid confusion

Indeed, this can be adjusted later. (But I would prefer a shorter name, perhaps just VectorPredBuilder would be good enough?)

or VPBuilder, like there's IRBuilder

or VPBuilder, like there's IRBuilder

That's unfortunate, as there is already a class named VPBuilder in LoopVectorizationPlanner.h.

/// VPlan-based builder utility analogous to IRBuilder. class VPBuilder { VPBasicBlock *BB = nullptr;

I think it's worth being more explicit for the name in the utility in llvm/IR, VectorPredBuilder would sound good to me.

nit: use /// for doc-comment

Mel-Chen · 2024-05-06T14:05:16Z

The title mentions this adds support for in-loop reductions, but I wasn't able to find a check to make sure we only vectorize in-loop reductions?

All tests seem to pass flags guiding towards the use of in-loop/ordered reductions, so the case where the regular reduction strategy is chosen may not be tested well

Indeed, we might need to change the title.
Originally, I only intended to support in-loop reduction first, but the bad news is that the legality check for folding with EVL occurs before the collection of in-loop reduction. This means we may need to postpone the legality check for folding with EVL or advance the collection of in-loop reduction.
The good news is that I have checked the IR generation in inloop-reduction.ll for IF-EVL-OUTLOOP, and it seems to be correct. This means we can directly open up both out-loop and in-loop reduction.
In conclusion, my suggestion is to change the title to "[LV][EVL] Support reduction idioms using tail folding with EVL." and directly add more RUN commands in the test cases to test the results of out-loop reduction.
@fhahn What do you think?

Mel-Chen · 2024-05-14T08:09:52Z

@fhahn ping

fhahn · 2024-05-15T09:56:45Z

The good news is that I have checked the IR generation in inloop-reduction.ll for IF-EVL-OUTLOOP, and it seems to be correct. This means we can directly open up both out-loop and in-loop reduction.
In conclusion, my suggestion is to change the title to "[LV][EVL] Support reduction idioms using tail folding with EVL." and directly add more RUN commands in the test cases to test the results of out-loop reduction.
@fhahn What do you think?

Sounds good to me, but it would be good to have some upstream buildbot that builds some code with the various options to have some runtime testing.

fhahn · 2024-05-15T09:20:53Z

llvm/include/llvm/IR/VectorBuilder.h

@@ -57,6 +58,11 @@ class VectorBuilder {
    return RetType();
  }

+  // Helper function for creating VP intrinsic call.


I think it's worth being more explicit for the name in the utility in llvm/IR, VectorPredBuilder would sound good to me.

llvm/lib/IR/VectorBuilder.cpp

llvm/lib/Transforms/Utils/LoopUtils.cpp

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll

llvm/lib/IR/IRBuilder.cpp

…ion." This reverts commit 8488520.

******************** Failed Tests (2): LLVM-Unit :: Transforms/Vectorize/./VectorizeTests/21/51 LLVM-Unit :: Transforms/Vectorize/./VectorizeTests/26/51

chapuni

llvm/IR should not depend on llvm/Analysis.

chapuni · 2024-07-16T11:35:34Z

llvm/include/llvm/IR/VectorBuilder.h

@@ -15,6 +15,7 @@
 #ifndef LLVM_IR_VECTORBUILDER_H
 #define LLVM_IR_VECTORBUILDER_H

+#include <llvm/Analysis/IVDescriptors.h>


This is a layering violation.

Thanks for pointing out this issue. I opened #99276 to fix it. Please take a look, thanks a lot.

…m#90184) Summary: Following from llvm#87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from llvm#76172. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D59822470

) Summary: Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251485

Mel-Chen requested review from fhahn, alexey-bataev, ayalz and aniragil April 26, 2024 09:21

llvmbot added vectorizers llvm:ir llvm:transforms labels Apr 26, 2024

Mel-Chen force-pushed the evl-reduction branch from ba0a5eb to dc1695a Compare April 26, 2024 09:39

fhahn reviewed Apr 26, 2024

View reviewed changes

llvm/lib/IR/IRBuilder.cpp Outdated Show resolved Hide resolved

nikolaypanchenko reviewed Apr 29, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Show resolved Hide resolved

alexey-bataev reviewed May 2, 2024

View reviewed changes

llvm/lib/IR/VectorBuilder.cpp Outdated Show resolved Hide resolved

llvm/include/llvm/IR/VectorBuilder.h Outdated Show resolved Hide resolved

Mel-Chen force-pushed the evl-reduction branch 2 times, most recently from df1c995 to af3e8a5 Compare May 3, 2024 08:38

Mel-Chen requested review from fhahn, nikolaypanchenko and alexey-bataev May 3, 2024 08:46

Mel-Chen force-pushed the evl-reduction branch from 72e455a to afdb103 Compare May 3, 2024 09:42

fhahn reviewed May 3, 2024

View reviewed changes

llvm/lib/IR/VectorBuilder.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll Show resolved Hide resolved

fhahn reviewed May 3, 2024

View reviewed changes

Mel-Chen force-pushed the evl-reduction branch from afdb103 to 2881f92 Compare May 6, 2024 14:07

Mel-Chen requested review from fhahn and simoll and removed request for nikolaypanchenko May 6, 2024 14:09

Mel-Chen force-pushed the evl-reduction branch from 2881f92 to 71fe035 Compare May 13, 2024 07:57

fhahn reviewed May 15, 2024

View reviewed changes

Mel-Chen added 23 commits July 14, 2024 23:09

mis-vectctorized after llvm#92092.

73f8bfa

Refine comments.

ea080d9

Refine the code of recipe replacement.

72c079d

Support type inference for VPReductionRecipe and VPReductionEVLRecipe

95387e6

Introduce ExplicitVectorLengthMask recipe for out-loop reduction.

84fe7ba

Revert "Introduce ExplicitVectorLengthMask recipe for out-loop reduct…

cb4e661

…ion." This reverts commit 8488520.

Drop the EVL style vplan because of outloop reduction.

7ff54a2

Inherit VPReductionRecipe.

fdaf774

Add virtual and override for getCondOp()

1ad4f50

Private RecurrenceDescriptor and ordered bool.

0b3e344

Refine WidenInduction check.

192fcbb

Rename IncludeWidenInduction to ContainsWidenInductions

9abd2e1

Rename IncludeOutloopReduction to ContainsOutLoopReductions

ea173fb

Rebase and update test cases.

7329881

Remove the check for the number of definations

3b32a4e

Replace the #definition check with isa<>.

b0b5009

Fix the bug from unit tests.

0f28142

******************** Failed Tests (2): LLVM-Unit :: Transforms/Vectorize/./VectorizeTests/21/51 LLVM-Unit :: Transforms/Vectorize/./VectorizeTests/26/51

Refine vectorBuilder

f98d740

Rebase and update test cases.

e3960ac

Remove lambda

c85020f

Manually implement classof of VPReductionRecipe.

20f6383

Add isConditional for VPreductionRecipe.

35bf9d0

Unify the format of file include.

16eb2ab

Mel-Chen force-pushed the evl-reduction branch from 62c8979 to 16eb2ab Compare July 15, 2024 07:13

Mel-Chen merged commit 4eb30cf into llvm:main Jul 16, 2024
7 checks passed

chapuni reviewed Jul 16, 2024

View reviewed changes

chapuni mentioned this pull request Jul 24, 2024

[VP] Refactor VectorBuilder to avoid layering violation. NFC #99276

Merged

Mel-Chen mentioned this pull request Aug 2, 2024

[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. #101641

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LV][EVL] Support in-loop reduction using tail folding with EVL. #90184

[LV][EVL] Support in-loop reduction using tail folding with EVL. #90184

Mel-Chen commented Apr 26, 2024 •

edited

Loading

llvmbot commented Apr 26, 2024 •

edited

Loading

github-actions bot commented Apr 26, 2024 •

edited

Loading

fhahn left a comment

fhahn May 3, 2024

Mel-Chen May 6, 2024

nikolaypanchenko May 6, 2024

Mel-Chen May 9, 2024

fhahn May 15, 2024

fhahn May 30, 2024

Mel-Chen commented May 6, 2024

Mel-Chen commented May 14, 2024

fhahn commented May 15, 2024

fhahn May 15, 2024

chapuni left a comment

chapuni Jul 16, 2024

Mel-Chen Jul 17, 2024

[LV][EVL] Support in-loop reduction using tail folding with EVL. #90184

[LV][EVL] Support in-loop reduction using tail folding with EVL. #90184

Conversation

Mel-Chen commented Apr 26, 2024 • edited Loading

llvmbot commented Apr 26, 2024 • edited Loading

github-actions bot commented Apr 26, 2024 • edited Loading

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mel-Chen commented May 6, 2024

Mel-Chen commented May 14, 2024

fhahn commented May 15, 2024

Choose a reason for hiding this comment

chapuni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mel-Chen commented Apr 26, 2024 •

edited

Loading

llvmbot commented Apr 26, 2024 •

edited

Loading

github-actions bot commented Apr 26, 2024 •

edited

Loading