Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LV][EVL] Support in-loop reduction using tail folding with EVL. #90184

Merged
merged 52 commits into from
Jul 16, 2024

Conversation

Mel-Chen
Copy link
Contributor

@Mel-Chen Mel-Chen commented Apr 26, 2024

Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction.

Address one of TODOs from #76172.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 26, 2024

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: Mel Chen (Mel-Chen)

Changes

Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction.
It's worth noting that, since vector predication intrinsics do not yet support fminimum and fmaximum reduction, fminimum and fmaximum reductions are temporarily blocked.

Address one of TODOs from #76172.


Patch is 148.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90184.diff

12 Files Affected:

  • (modified) llvm/include/llvm/IR/IRBuilder.h (+19)
  • (modified) llvm/include/llvm/Transforms/Utils/LoopUtils.h (+6)
  • (modified) llvm/lib/IR/IRBuilder.cpp (+122)
  • (modified) llvm/lib/Transforms/Utils/LoopUtils.cpp (+56)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+12-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+64)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+53)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+23-19)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+1)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+137-66)
  • (added) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll (+112)
  • (added) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll (+1495)
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index b6534a1962a2f5..4db1fe5ff93aef 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -746,49 +746,68 @@ class IRBuilderBase {
 private:
   CallInst *getReductionIntrinsic(Intrinsic::ID ID, Value *Src);
 
+  // Helper function for creating VP reduce intrinsic call.
+  CallInst *getReductionIntrinsic(Intrinsic::ID ID, Value *Acc, Value *Src,
+                                  Value *Mask, Value *EVL);
+
 public:
   /// Create a sequential vector fadd reduction intrinsic of the source vector.
   /// The first parameter is a scalar accumulator value. An unordered reduction
   /// can be created by adding the reassoc fast-math flag to the resulting
   /// sequential reduction.
   CallInst *CreateFAddReduce(Value *Acc, Value *Src);
+  CallInst *CreateFAddReduce(Value *Acc, Value *Src, Value *EVL,
+                             Value *Mask = nullptr);
 
   /// Create a sequential vector fmul reduction intrinsic of the source vector.
   /// The first parameter is a scalar accumulator value. An unordered reduction
   /// can be created by adding the reassoc fast-math flag to the resulting
   /// sequential reduction.
   CallInst *CreateFMulReduce(Value *Acc, Value *Src);
+  CallInst *CreateFMulReduce(Value *Acc, Value *Src, Value *EVL,
+                             Value *Mask = nullptr);
 
   /// Create a vector int add reduction intrinsic of the source vector.
   CallInst *CreateAddReduce(Value *Src);
+  CallInst *CreateAddReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector int mul reduction intrinsic of the source vector.
   CallInst *CreateMulReduce(Value *Src);
+  CallInst *CreateMulReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector int AND reduction intrinsic of the source vector.
   CallInst *CreateAndReduce(Value *Src);
+  CallInst *CreateAndReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector int OR reduction intrinsic of the source vector.
   CallInst *CreateOrReduce(Value *Src);
+  CallInst *CreateOrReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector int XOR reduction intrinsic of the source vector.
   CallInst *CreateXorReduce(Value *Src);
+  CallInst *CreateXorReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector integer max reduction intrinsic of the source
   /// vector.
   CallInst *CreateIntMaxReduce(Value *Src, bool IsSigned = false);
+  CallInst *CreateIntMaxReduce(Value *Src, Value *EVL, bool IsSigned = false,
+                               Value *Mask = nullptr);
 
   /// Create a vector integer min reduction intrinsic of the source
   /// vector.
   CallInst *CreateIntMinReduce(Value *Src, bool IsSigned = false);
+  CallInst *CreateIntMinReduce(Value *Src, Value *EVL, bool IsSigned = false,
+                               Value *Mask = nullptr);
 
   /// Create a vector float max reduction intrinsic of the source
   /// vector.
   CallInst *CreateFPMaxReduce(Value *Src);
+  CallInst *CreateFPMaxReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector float min reduction intrinsic of the source
   /// vector.
   CallInst *CreateFPMinReduce(Value *Src);
+  CallInst *CreateFPMinReduce(Value *Src, Value *EVL, Value *Mask = nullptr);
 
   /// Create a vector float maximum reduction intrinsic of the source
   /// vector. This variant follows the NaN and signed zero semantic of
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 187ace3a0cbedf..5003fa66100b46 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -403,6 +403,9 @@ Value *getShuffleReduction(IRBuilderBase &Builder, Value *Src, unsigned Op,
 /// Fast-math-flags are propagated using the IRBuilder's setting.
 Value *createSimpleTargetReduction(IRBuilderBase &B, Value *Src,
                                    RecurKind RdxKind);
+Value *createSimpleTargetReduction(IRBuilderBase &B, Value *Src,
+                                   RecurKind RdxKind, Value *EVL,
+                                   Value *Mask = nullptr);
 
 /// Create a target reduction of the given vector \p Src for a reduction of the
 /// kind RecurKind::IAnyOf or RecurKind::FAnyOf. The reduction operation is
@@ -423,6 +426,9 @@ Value *createTargetReduction(IRBuilderBase &B, const RecurrenceDescriptor &Desc,
 Value *createOrderedReduction(IRBuilderBase &B,
                               const RecurrenceDescriptor &Desc, Value *Src,
                               Value *Start);
+Value *createOrderedReduction(IRBuilderBase &B,
+                              const RecurrenceDescriptor &Desc, Value *Src,
+                              Value *Start, Value *EVL, Value *Mask = nullptr);
 
 /// Get the intersection (logical and) of all of the potential IR flags
 /// of each scalar operation (VL) that will be converted into a vector (I).
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index d6746d1d438242..90f637940d00da 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -414,6 +414,20 @@ CallInst *IRBuilderBase::getReductionIntrinsic(Intrinsic::ID ID, Value *Src) {
   return CreateCall(Decl, Ops);
 }
 
+CallInst *IRBuilderBase::getReductionIntrinsic(Intrinsic::ID ID, Value *Acc,
+                                               Value *Src, Value *Mask,
+                                               Value *EVL) {
+  Module *M = GetInsertBlock()->getParent()->getParent();
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  EVL = CreateIntCast(EVL, getInt32Ty(), /*isSigned=*/false);
+  if (!Mask)
+    Mask = CreateVectorSplat(SrcTy->getElementCount(), getTrue());
+  Value *Ops[] = {Acc, Src, Mask, EVL};
+  Type *Tys[] = {SrcTy};
+  auto Decl = Intrinsic::getDeclaration(M, ID, Tys);
+  return CreateCall(Decl, Ops);
+}
+
 CallInst *IRBuilderBase::CreateFAddReduce(Value *Acc, Value *Src) {
   Module *M = GetInsertBlock()->getParent()->getParent();
   Value *Ops[] = {Acc, Src};
@@ -422,6 +436,11 @@ CallInst *IRBuilderBase::CreateFAddReduce(Value *Acc, Value *Src) {
   return CreateCall(Decl, Ops);
 }
 
+CallInst *IRBuilderBase::CreateFAddReduce(Value *Acc, Value *Src, Value *EVL,
+                                          Value *Mask) {
+  return getReductionIntrinsic(Intrinsic::vp_reduce_fadd, Acc, Src, Mask ,EVL);
+}
+
 CallInst *IRBuilderBase::CreateFMulReduce(Value *Acc, Value *Src) {
   Module *M = GetInsertBlock()->getParent()->getParent();
   Value *Ops[] = {Acc, Src};
@@ -430,46 +449,149 @@ CallInst *IRBuilderBase::CreateFMulReduce(Value *Acc, Value *Src) {
   return CreateCall(Decl, Ops);
 }
 
+CallInst *IRBuilderBase::CreateFMulReduce(Value *Acc, Value *Src, Value *EVL,
+                                          Value *Mask) {
+  return getReductionIntrinsic(Intrinsic::vp_reduce_fmul, Acc, Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateAddReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_add, Src);
 }
 
+CallInst *IRBuilderBase::CreateAddReduce(Value *Src, Value *EVL, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(Intrinsic::vp_reduce_add,
+                               ConstantInt::get(EltTy, 0), Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateMulReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_mul, Src);
 }
 
+CallInst *IRBuilderBase::CreateMulReduce(Value *Src, Value *EVL, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(Intrinsic::vp_reduce_mul,
+                               ConstantInt::get(EltTy, 1), Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateAndReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_and, Src);
 }
 
+CallInst *IRBuilderBase::CreateAndReduce(Value *Src, Value *EVL, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(Intrinsic::vp_reduce_and,
+                               Constant::getAllOnesValue(EltTy), Src, Mask,
+                               EVL);
+}
+
 CallInst *IRBuilderBase::CreateOrReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_or, Src);
 }
 
+CallInst *IRBuilderBase::CreateOrReduce(Value *Src, Value *EVL, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(Intrinsic::vp_reduce_or,
+                               ConstantInt::get(EltTy, 0), Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateXorReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_xor, Src);
 }
 
+CallInst *IRBuilderBase::CreateXorReduce(Value *Src, Value *EVL, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(Intrinsic::vp_reduce_xor,
+                               ConstantInt::get(EltTy, 0), Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateIntMaxReduce(Value *Src, bool IsSigned) {
   auto ID =
       IsSigned ? Intrinsic::vector_reduce_smax : Intrinsic::vector_reduce_umax;
   return getReductionIntrinsic(ID, Src);
 }
 
+CallInst *IRBuilderBase::CreateIntMaxReduce(Value *Src, Value *EVL,
+                                            bool IsSigned, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(
+      IsSigned ? Intrinsic::vp_reduce_smax : Intrinsic::vp_reduce_umax,
+      IsSigned ? ConstantInt::get(EltTy, APInt::getSignedMinValue(
+                                             EltTy->getIntegerBitWidth()))
+               : ConstantInt::get(EltTy, 0),
+      Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateIntMinReduce(Value *Src, bool IsSigned) {
   auto ID =
       IsSigned ? Intrinsic::vector_reduce_smin : Intrinsic::vector_reduce_umin;
   return getReductionIntrinsic(ID, Src);
 }
 
+CallInst *IRBuilderBase::CreateIntMinReduce(Value *Src, Value *EVL,
+                                            bool IsSigned, Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  return getReductionIntrinsic(
+      IsSigned ? Intrinsic::vp_reduce_smin : Intrinsic::vp_reduce_umin,
+      IsSigned ? ConstantInt::get(EltTy, APInt::getSignedMaxValue(
+                                             EltTy->getIntegerBitWidth()))
+               : Constant::getAllOnesValue(EltTy),
+      Src, Mask, EVL);
+}
+
 CallInst *IRBuilderBase::CreateFPMaxReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_fmax, Src);
 }
 
+CallInst *IRBuilderBase::CreateFPMaxReduce(Value *Src, Value *EVL,
+                                           Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  FastMathFlags FMF = getFastMathFlags();
+  Value *Neutral;
+  if (FMF.noNaNs())
+    Neutral = FMF.noInfs()
+                  ? ConstantFP::get(
+                        EltTy, APFloat::getLargest(EltTy->getFltSemantics(),
+                                                   /*Negative=*/true))
+                  : ConstantFP::getInfinity(EltTy, true);
+  else
+    Neutral = ConstantFP::getQNaN(EltTy, /*Negative=*/true);
+
+  return getReductionIntrinsic(Intrinsic::vp_reduce_fmax, Neutral, Src, Mask,
+                               EVL);
+}
+
 CallInst *IRBuilderBase::CreateFPMinReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_fmin, Src);
 }
 
+CallInst *IRBuilderBase::CreateFPMinReduce(Value *Src, Value *EVL,
+                                           Value *Mask) {
+  auto *SrcTy = cast<VectorType>(Src->getType());
+  auto *EltTy = SrcTy->getElementType();
+  FastMathFlags FMF = getFastMathFlags();
+  Value *Neutral;
+  if (FMF.noNaNs())
+    Neutral = FMF.noInfs()
+                  ? ConstantFP::get(
+                        EltTy, APFloat::getLargest(EltTy->getFltSemantics(),
+                                                   /*Negative=*/false))
+                  : ConstantFP::getInfinity(EltTy, false);
+  else
+    Neutral = ConstantFP::getQNaN(EltTy, /*Negative=*/false);
+
+  return getReductionIntrinsic(Intrinsic::vp_reduce_fmin, Neutral, Src, Mask,
+                               EVL);
+}
+
 CallInst *IRBuilderBase::CreateFPMaximumReduce(Value *Src) {
   return getReductionIntrinsic(Intrinsic::vector_reduce_fmaximum, Src);
 }
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 73c5d636782294..d0abcdfb1440ab 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -1204,6 +1204,48 @@ Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
   }
 }
 
+Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
+                                         RecurKind RdxKind, Value *EVL,
+                                         Value *Mask) {
+  auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();
+  switch (RdxKind) {
+  case RecurKind::Add:
+    return Builder.CreateAddReduce(Src, EVL, Mask);
+  case RecurKind::Mul:
+    return Builder.CreateMulReduce(Src, EVL, Mask);
+  case RecurKind::And:
+    return Builder.CreateAndReduce(Src, EVL, Mask);
+  case RecurKind::Or:
+    return Builder.CreateOrReduce(Src, EVL, Mask);
+  case RecurKind::Xor:
+    return Builder.CreateXorReduce(Src, EVL, Mask);
+  case RecurKind::FMulAdd:
+  case RecurKind::FAdd:
+    return Builder.CreateFAddReduce(ConstantFP::getNegativeZero(SrcVecEltTy),
+                                    Src, EVL, Mask);
+  case RecurKind::FMul:
+    return Builder.CreateFMulReduce(ConstantFP::get(SrcVecEltTy, 1.0), Src, EVL,
+                                    Mask);
+  case RecurKind::SMax:
+    return Builder.CreateIntMaxReduce(Src, EVL, true, Mask);
+  case RecurKind::SMin:
+    return Builder.CreateIntMinReduce(Src, EVL, true, Mask);
+  case RecurKind::UMax:
+    return Builder.CreateIntMaxReduce(Src, EVL, false, Mask);
+  case RecurKind::UMin:
+    return Builder.CreateIntMinReduce(Src, EVL, false, Mask);
+  case RecurKind::FMax:
+    return Builder.CreateFPMaxReduce(Src, EVL, Mask);
+  case RecurKind::FMin:
+    return Builder.CreateFPMinReduce(Src, EVL, Mask);
+  case RecurKind::FMinimum:
+  case RecurKind::FMaximum:
+    assert(0 && "FMaximum/FMinimum reduction VP intrinsic is not supported.");
+  default:
+    llvm_unreachable("Unhandled opcode");
+  }
+}
+
 Value *llvm::createTargetReduction(IRBuilderBase &B,
                                    const RecurrenceDescriptor &Desc, Value *Src,
                                    PHINode *OrigPhi) {
@@ -1232,6 +1274,20 @@ Value *llvm::createOrderedReduction(IRBuilderBase &B,
   return B.CreateFAddReduce(Start, Src);
 }
 
+Value *llvm::createOrderedReduction(IRBuilderBase &B,
+                                    const RecurrenceDescriptor &Desc,
+                                    Value *Src, Value *Start, Value *EVL,
+                                    Value *Mask) {
+  assert((Desc.getRecurrenceKind() == RecurKind::FAdd ||
+          Desc.getRecurrenceKind() == RecurKind::FMulAdd) &&
+         "Unexpected reduction kind");
+  assert(Src->getType()->isVectorTy() && "Expected a vector type");
+  assert(!Start->getType()->isVectorTy() && "Expected a scalar type");
+  assert(EVL->getType()->isIntegerTy() && "Expected a integer type");
+
+  return B.CreateFAddReduce(Start, Src, EVL, Mask);
+}
+
 void llvm::propagateIRFlags(Value *I, ArrayRef<Value *> VL, Value *OpValue,
                             bool IncludeWrapFlags) {
   auto *VecOp = dyn_cast<Instruction>(I);
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 33c4decd58a6c2..1db531e170a4bf 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1526,6 +1526,17 @@ class LoopVectorizationCostModel {
                                             ForceTailFoldingStyle.getValue());
     if (ForceTailFoldingStyle != TailFoldingStyle::DataWithEVL)
       return;
+
+    // Block folding with EVL since vector-predication intrinsics have not
+    // support FMinimum and FMaximum reduction.
+    // FIXME: remove this check once llvm.vp.reduce.fminimum/fmaximum are
+    // supported
+    bool ContainsFMinimumOrFMaximumReduction =
+        any_of(Legal->getReductionVars(), [&](auto &Reduction) {
+          const RecurrenceDescriptor &RdxDesc = Reduction.second;
+          RecurKind Kind = RdxDesc.getRecurrenceKind();
+          return Kind == RecurKind::FMinimum || Kind == RecurKind::FMaximum;
+        });
     // Override forced styles if needed.
     // FIXME: use actual opcode/data type for analysis here.
     // FIXME: Investigate opportunity for fixed vector factor.
@@ -1535,8 +1546,7 @@ class LoopVectorizationCostModel {
         !EnableVPlanNativePath &&
         // FIXME: implement support for max safe dependency distance.
         Legal->isSafeForAnyVectorWidth() &&
-        // FIXME: remove this once reductions are supported.
-        Legal->getReductionVars().empty();
+        !ContainsFMinimumOrFMaximumReduction;
     if (!EVLIsLegal) {
       // If for some reason EVL mode is unsupported, fallback to
       // DataWithoutLaneMask to try to vectorize the loop with folded tail
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index c74329a0bcc4ac..a444064dab692a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -843,6 +843,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
     case VPRecipeBase::VPDerivedIVSC:
     case VPRecipeBase::VPExpandSCEVSC:
     case VPRecipeBase::VPInstructionSC:
+    case VPRecipeBase::VPReductionEVLSC:
     case VPRecipeBase::VPReductionSC:
     case VPRecipeBase::VPReplicateSC:
     case VPRecipeBase::VPScalarIVStepsSC:
@@ -2110,6 +2111,12 @@ class VPReductionRecipe : public VPSingleDefRecipe {
              VPSlotTracker &SlotTracker) const override;
 #endif
 
+  /// Return the recurrence decriptor for the in-loop reduction.
+  const RecurrenceDescriptor &getRecurrenceDescriptor() const {
+    return RdxDesc;
+  }
+  /// Return true if the in-loop reduction is ordered.
+  bool isOrdered() const { return IsOrdered; };
   /// The VPValue of the scalar Chain being accumulated.
   VPValue *getChainOp() const { return getOperand(0); }
   /// The VPValue of the vector value to be reduced.
@@ -2120,6 +2127,63 @@ class VPReductionRecipe : public VPSingleDefRecipe {
   }
 };
 
+/// A recipe to represent inloop reduction operations with vector-predication
+/// intrinsics, performing a reduction on a vector operand with the explicit
+/// vector length (EVL) into a scalar value, and adding the result to a chain.
+/// The Operands are {ChainOp, VecOp, EVL, [Condition]}.
+class VPReductionEVLRecipe : public VPSingleDefRecipe {
+  /// The recurrence decriptor for the reduction in question.
+  const RecurrenceDescriptor &RdxDesc;
+  bool IsOrdered;
+
+public:
+  VPReductionEVLRecipe(VPReductionRecipe *R, VPValue *EVL)
+      : VPSingleDefRecipe(
+            VPDef::VPReductionEVLSC,
+            ArrayRef<VPValue *>({R->getChainOp(), R->getVecOp(), EVL}),
+            R->getUnderlyingInstr()),
+        RdxDesc(R->getRecurrenceDescriptor()), IsOrdered(R->isOrdered()) {
+    VPValue *CondOp = R->getCondOp();
+    if (CondOp)
+      addOperand(CondOp);
+  };
+
+  ~VPReductionEVLRecipe() override = default;
+
+  VPReductionEVLRecipe *clone() override {
+    llvm_unreachable("cloning not implemented yet");
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPReductionEVLSC)
+
+  /// Generate the reduction in the loop
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+
+  /// The VPValue of the scalar Chain being accumulated.
+  VPValue *getChainOp() const { return g...
[truncated]

Copy link

github-actions bot commented Apr 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/IR/IRBuilder.cpp Outdated Show resolved Hide resolved
llvm/lib/IR/VectorBuilder.cpp Outdated Show resolved Hide resolved
llvm/include/llvm/IR/VectorBuilder.h Outdated Show resolved Hide resolved
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title mentions this adds support for in-loop reductions, but I wasn't able to find a check to make sure we only vectorize in-loop reductions?

All tests seem to pass flags guiding towards the use of in-loop/ordered reductions, so the case where the regular reduction strategy is chosen may not be tested well

@@ -57,6 +58,11 @@ class VectorBuilder {
return RetType();
}

// Helper function for creating VP intrinsic call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

independent of this change, but if VectorBuilder only supports generation of vector-predication intrinsics, then it would be better to call it VectorPredicationBuilder to avoid confusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this can be adjusted later. (But I would prefer a shorter name, perhaps just VectorPredBuilder would be good enough?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or VPBuilder, like there's IRBuilder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or VPBuilder, like there's IRBuilder

That's unfortunate, as there is already a class named VPBuilder in LoopVectorizationPlanner.h.

/// VPlan-based builder utility analogous to IRBuilder.
class VPBuilder {
  VPBasicBlock *BB = nullptr;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth being more explicit for the name in the utility in llvm/IR, VectorPredBuilder would sound good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use /// for doc-comment

@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented May 6, 2024

The title mentions this adds support for in-loop reductions, but I wasn't able to find a check to make sure we only vectorize in-loop reductions?

All tests seem to pass flags guiding towards the use of in-loop/ordered reductions, so the case where the regular reduction strategy is chosen may not be tested well

Indeed, we might need to change the title.
Originally, I only intended to support in-loop reduction first, but the bad news is that the legality check for folding with EVL occurs before the collection of in-loop reduction. This means we may need to postpone the legality check for folding with EVL or advance the collection of in-loop reduction.
The good news is that I have checked the IR generation in inloop-reduction.ll for IF-EVL-OUTLOOP, and it seems to be correct. This means we can directly open up both out-loop and in-loop reduction.
In conclusion, my suggestion is to change the title to "[LV][EVL] Support reduction idioms using tail folding with EVL." and directly add more RUN commands in the test cases to test the results of out-loop reduction.
@fhahn What do you think?

@Mel-Chen
Copy link
Contributor Author

@fhahn ping

@fhahn
Copy link
Contributor

fhahn commented May 15, 2024

The good news is that I have checked the IR generation in inloop-reduction.ll for IF-EVL-OUTLOOP, and it seems to be correct. This means we can directly open up both out-loop and in-loop reduction.
In conclusion, my suggestion is to change the title to "[LV][EVL] Support reduction idioms using tail folding with EVL." and directly add more RUN commands in the test cases to test the results of out-loop reduction.
@fhahn What do you think?

Sounds good to me, but it would be good to have some upstream buildbot that builds some code with the various options to have some runtime testing.

@@ -57,6 +58,11 @@ class VectorBuilder {
return RetType();
}

// Helper function for creating VP intrinsic call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth being more explicit for the name in the utility in llvm/IR, VectorPredBuilder would sound good to me.

llvm/lib/IR/VectorBuilder.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Utils/LoopUtils.cpp Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved
llvm/lib/IR/IRBuilder.cpp Outdated Show resolved Hide resolved
@Mel-Chen Mel-Chen merged commit 4eb30cf into llvm:main Jul 16, 2024
7 checks passed
Copy link
Contributor

@chapuni chapuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm/IR should not depend on llvm/Analysis.

@@ -15,6 +15,7 @@
#ifndef LLVM_IR_VECTORBUILDER_H
#define LLVM_IR_VECTORBUILDER_H

#include <llvm/Analysis/IVDescriptors.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a layering violation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out this issue. I opened #99276 to fix it. Please take a look, thanks a lot.

sayhaan pushed a commit to sayhaan/llvm-project that referenced this pull request Jul 16, 2024
…m#90184)

Summary:
Following from llvm#87816, add VPReductionEVLRecipe to describe vector
predication reduction.

Address one of TODOs from llvm#76172.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D59822470
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
)

Summary:
Following from #87816, add VPReductionEVLRecipe to describe vector
predication reduction.

Address one of TODOs from #76172.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251485
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants