Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LV][EVL] Support in-loop reduction using tail folding with EVL. #90184

Merged
merged 52 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
c392aad
Utils support
Mel-Chen Apr 23, 2024
cf0b2c1
Enable lit test case
Mel-Chen Apr 25, 2024
1b929c5
Remove the constraint for reductions
Mel-Chen Apr 25, 2024
a7ba2d5
update test case
Mel-Chen Apr 25, 2024
370831f
The initial implementation modeled after VPWidenLoadEVLRecipe.
Mel-Chen Apr 25, 2024
be76dc5
Add test case for ordered reduction
Mel-Chen Apr 25, 2024
7e6d46d
Add test case for all reduction kinds
Mel-Chen Apr 25, 2024
749f221
Block FMinimum and FMaximum
Mel-Chen Apr 25, 2024
b412a6f
remove clone function
Mel-Chen Apr 26, 2024
83d5d2e
Add VPlan test
Mel-Chen May 2, 2024
d1cb428
Move utils to VectorBuilder
Mel-Chen May 2, 2024
7a4324a
Revert "Utils support"
Mel-Chen May 2, 2024
716903e
Fix comments format
Mel-Chen May 3, 2024
2fd9862
Allow null underlyInstr
Mel-Chen May 3, 2024
f8522df
Implement mayWriteToMemory, mayReadFromMemory and mayHaveSideEffects
Mel-Chen May 3, 2024
ba308c0
Update llvm/lib/IR/VectorBuilder.cpp
Mel-Chen May 3, 2024
7206394
Enable vp fminimum and fmaximum
Mel-Chen May 13, 2024
3b041bf
Revert "Block FMinimum and FMaximum"
Mel-Chen May 13, 2024
69e5fa2
Add assertion for AnyOf
Mel-Chen May 13, 2024
1c1f8de
Add test case for anyof
Mel-Chen May 16, 2024
fe2a2c6
Add test case for conditional reduction
Mel-Chen May 16, 2024
b051f98
Add test case for recipe dump
Mel-Chen May 16, 2024
906981b
Add test case for intermediate store
Mel-Chen May 16, 2024
dfd6f84
Remove redundant code
Mel-Chen May 16, 2024
a2d5f3f
Split test cases
Mel-Chen May 16, 2024
7b14c80
Remove mask(CondOp) if the mask equals header mask.
Mel-Chen May 17, 2024
996d0c7
Refine recipe print
Mel-Chen May 17, 2024
44931f5
Replace fatal error with assert.
Mel-Chen May 20, 2024
57f8486
Add conditional reduction test cases with widenInduction.
Mel-Chen May 27, 2024
73f8bfa
mis-vectctorized after #92092.
Mel-Chen May 27, 2024
ea080d9
Refine comments.
Mel-Chen Jun 4, 2024
72c079d
Refine the code of recipe replacement.
Mel-Chen Jun 5, 2024
95387e6
Support type inference for VPReductionRecipe and VPReductionEVLRecipe
Mel-Chen Jun 5, 2024
84fe7ba
Introduce ExplicitVectorLengthMask recipe for out-loop reduction.
Mel-Chen Jun 8, 2024
cb4e661
Revert "Introduce ExplicitVectorLengthMask recipe for out-loop reduct…
Mel-Chen Jun 14, 2024
7ff54a2
Drop the EVL style vplan because of outloop reduction.
Mel-Chen Jun 14, 2024
fdaf774
Inherit VPReductionRecipe.
Mel-Chen Jun 19, 2024
1ad4f50
Add virtual and override for getCondOp()
Mel-Chen Jun 20, 2024
0b3e344
Private RecurrenceDescriptor and ordered bool.
Mel-Chen Jun 20, 2024
192fcbb
Refine WidenInduction check.
Mel-Chen Jun 20, 2024
9abd2e1
Rename IncludeWidenInduction to ContainsWidenInductions
Mel-Chen Jun 20, 2024
ea173fb
Rename IncludeOutloopReduction to ContainsOutLoopReductions
Mel-Chen Jun 20, 2024
7329881
Rebase and update test cases.
Mel-Chen Jun 24, 2024
3b32a4e
Remove the check for the number of definations
Mel-Chen Jun 24, 2024
b0b5009
Replace the #definition check with isa<>.
Mel-Chen Jun 24, 2024
0f28142
Fix the bug from unit tests.
Mel-Chen Jun 24, 2024
f98d740
Refine vectorBuilder
Mel-Chen Jun 28, 2024
e3960ac
Rebase and update test cases.
Mel-Chen Jul 8, 2024
c85020f
Remove lambda
Mel-Chen Jul 8, 2024
20f6383
Manually implement classof of VPReductionRecipe.
Mel-Chen Jul 8, 2024
35bf9d0
Add isConditional for VPreductionRecipe.
Mel-Chen Jul 8, 2024
16eb2ab
Unify the format of file include.
Mel-Chen Jul 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions llvm/include/llvm/IR/VectorBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#ifndef LLVM_IR_VECTORBUILDER_H
#define LLVM_IR_VECTORBUILDER_H

#include <llvm/Analysis/IVDescriptors.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a layering violation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out this issue. I opened #99276 to fix it. Please take a look, thanks a lot.

#include <llvm/IR/IRBuilder.h>
#include <llvm/IR/InstrTypes.h>
#include <llvm/IR/Instruction.h>
Expand Down Expand Up @@ -57,6 +58,11 @@ class VectorBuilder {
return RetType();
}

/// Helper function for creating VP intrinsic call.
Value *createVectorInstructionImpl(Intrinsic::ID VPID, Type *ReturnTy,
ArrayRef<Value *> VecOpArray,
const Twine &Name = Twine());

public:
VectorBuilder(IRBuilderBase &Builder,
Behavior ErrorHandling = Behavior::ReportAndAbort)
Expand Down Expand Up @@ -92,6 +98,15 @@ class VectorBuilder {
Value *createVectorInstruction(unsigned Opcode, Type *ReturnTy,
ArrayRef<Value *> VecOpArray,
const Twine &Name = Twine());

/// Emit a VP reduction intrinsic call for recurrence kind.
/// \param Kind The kind of recurrence
/// \param ValTy The type of operand which the reduction operation is
/// performed.
/// \param VecOpArray The operand list.
Value *createSimpleTargetReduction(RecurKind Kind, Type *ValTy,
ArrayRef<Value *> VecOpArray,
const Twine &Name = Twine());
};

} // namespace llvm
Expand Down
10 changes: 10 additions & 0 deletions llvm/include/llvm/Transforms/Utils/LoopUtils.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

#include "llvm/Analysis/IVDescriptors.h"
#include "llvm/Analysis/LoopAccessAnalysis.h"
#include "llvm/IR/VectorBuilder.h"
#include "llvm/Transforms/Utils/ValueMapper.h"

namespace llvm {
Expand Down Expand Up @@ -394,6 +395,10 @@ Value *getShuffleReduction(IRBuilderBase &Builder, Value *Src, unsigned Op,
/// Fast-math-flags are propagated using the IRBuilder's setting.
Value *createSimpleTargetReduction(IRBuilderBase &B, Value *Src,
RecurKind RdxKind);
/// Overloaded function to generate vector-predication intrinsics for target
/// reduction.
Value *createSimpleTargetReduction(VectorBuilder &VB, Value *Src,
Mel-Chen marked this conversation as resolved.
Show resolved Hide resolved
const RecurrenceDescriptor &Desc);

/// Create a target reduction of the given vector \p Src for a reduction of the
/// kind RecurKind::IAnyOf or RecurKind::FAnyOf. The reduction operation is
Expand All @@ -414,6 +419,11 @@ Value *createTargetReduction(IRBuilderBase &B, const RecurrenceDescriptor &Desc,
Value *createOrderedReduction(IRBuilderBase &B,
const RecurrenceDescriptor &Desc, Value *Src,
Value *Start);
/// Overloaded function to generate vector-predication intrinsics for ordered
/// reduction.
Value *createOrderedReduction(VectorBuilder &VB,
Mel-Chen marked this conversation as resolved.
Show resolved Hide resolved
const RecurrenceDescriptor &Desc, Value *Src,
Value *Start);

/// Get the intersection (logical and) of all of the potential IR flags
/// of each scalar operation (VL) that will be converted into a vector (I).
Expand Down
63 changes: 63 additions & 0 deletions llvm/lib/IR/VectorBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,70 @@ Value *VectorBuilder::createVectorInstruction(unsigned Opcode, Type *ReturnTy,
auto VPID = VPIntrinsic::getForOpcode(Opcode);
if (VPID == Intrinsic::not_intrinsic)
return returnWithError<Value *>("No VPIntrinsic for this opcode");
return createVectorInstructionImpl(VPID, ReturnTy, InstOpArray, Name);
}

Value *VectorBuilder::createSimpleTargetReduction(RecurKind Kind, Type *ValTy,
ArrayRef<Value *> InstOpArray,
const Twine &Name) {
Intrinsic::ID VPID;
switch (Kind) {
case RecurKind::Add:
VPID = Intrinsic::vp_reduce_add;
break;
case RecurKind::Mul:
VPID = Intrinsic::vp_reduce_mul;
break;
case RecurKind::And:
VPID = Intrinsic::vp_reduce_and;
break;
case RecurKind::Or:
VPID = Intrinsic::vp_reduce_or;
break;
case RecurKind::Xor:
VPID = Intrinsic::vp_reduce_xor;
break;
case RecurKind::FMulAdd:
case RecurKind::FAdd:
VPID = Intrinsic::vp_reduce_fadd;
break;
case RecurKind::FMul:
VPID = Intrinsic::vp_reduce_fmul;
break;
case RecurKind::SMax:
VPID = Intrinsic::vp_reduce_smax;
break;
case RecurKind::SMin:
VPID = Intrinsic::vp_reduce_smin;
break;
case RecurKind::UMax:
VPID = Intrinsic::vp_reduce_umax;
break;
case RecurKind::UMin:
VPID = Intrinsic::vp_reduce_umin;
break;
case RecurKind::FMax:
VPID = Intrinsic::vp_reduce_fmax;
break;
case RecurKind::FMin:
VPID = Intrinsic::vp_reduce_fmin;
break;
case RecurKind::FMaximum:
VPID = Intrinsic::vp_reduce_fmaximum;
break;
case RecurKind::FMinimum:
VPID = Intrinsic::vp_reduce_fminimum;
break;
default:
llvm_unreachable("No VPIntrinsic for this reduction");
}
return createVectorInstructionImpl(VPID, ValTy, InstOpArray, Name);
}

Value *VectorBuilder::createVectorInstructionImpl(Intrinsic::ID VPID,
Type *ReturnTy,
ArrayRef<Value *> InstOpArray,
const Twine &Name) {
auto MaskPosOpt = VPIntrinsic::getMaskParamPos(VPID);
auto VLenPosOpt = VPIntrinsic::getVectorLengthParamPos(VPID);
size_t NumInstParams = InstOpArray.size();
Expand Down
27 changes: 27 additions & 0 deletions llvm/lib/Transforms/Utils/LoopUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1192,6 +1192,19 @@ Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
}
}

Value *llvm::createSimpleTargetReduction(VectorBuilder &VBuilder, Value *Src,
const RecurrenceDescriptor &Desc) {
RecurKind Kind = Desc.getRecurrenceKind();
assert(!RecurrenceDescriptor::isAnyOfRecurrenceKind(Kind) &&
Mel-Chen marked this conversation as resolved.
Show resolved Hide resolved
"AnyOf reduction is not supported.");
auto *SrcTy = cast<VectorType>(Src->getType());
Type *SrcEltTy = SrcTy->getElementType();
Value *Iden =
Desc.getRecurrenceIdentity(Kind, SrcEltTy, Desc.getFastMathFlags());
Value *Ops[] = {Iden, Src};
return VBuilder.createSimpleTargetReduction(Kind, SrcTy, Ops);
}

Value *llvm::createTargetReduction(IRBuilderBase &B,
const RecurrenceDescriptor &Desc, Value *Src,
PHINode *OrigPhi) {
Expand Down Expand Up @@ -1220,6 +1233,20 @@ Value *llvm::createOrderedReduction(IRBuilderBase &B,
return B.CreateFAddReduce(Start, Src);
}

Value *llvm::createOrderedReduction(VectorBuilder &VBuilder,
const RecurrenceDescriptor &Desc,
Value *Src, Value *Start) {
assert((Desc.getRecurrenceKind() == RecurKind::FAdd ||
Desc.getRecurrenceKind() == RecurKind::FMulAdd) &&
"Unexpected reduction kind");
assert(Src->getType()->isVectorTy() && "Expected a vector type");
assert(!Start->getType()->isVectorTy() && "Expected a scalar type");

auto *SrcTy = cast<VectorType>(Src->getType());
Value *Ops[] = {Start, Src};
return VBuilder.createSimpleTargetReduction(RecurKind::FAdd, SrcTy, Ops);
}

void llvm::propagateIRFlags(Value *I, ArrayRef<Value *> VL, Value *OpValue,
bool IncludeWrapFlags) {
auto *VecOp = dyn_cast<Instruction>(I);
Expand Down
4 changes: 1 addition & 3 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1516,9 +1516,7 @@ class LoopVectorizationCostModel {
TTI.hasActiveVectorLength(0, nullptr, Align()) &&
!EnableVPlanNativePath &&
// FIXME: implement support for max safe dependency distance.
Legal->isSafeForAnyVectorWidth() &&
// FIXME: remove this once reductions are supported.
Legal->getReductionVars().empty();
Legal->isSafeForAnyVectorWidth();
if (!EVLIsLegal) {
// If for some reason EVL mode is unsupported, fallback to
// DataWithoutLaneMask to try to vectorize the loop with folded tail
Expand Down
84 changes: 76 additions & 8 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -909,6 +909,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
case VPRecipeBase::VPEVLBasedIVPHISC:
case VPRecipeBase::VPExpandSCEVSC:
case VPRecipeBase::VPInstructionSC:
case VPRecipeBase::VPReductionEVLSC:
case VPRecipeBase::VPReductionSC:
case VPRecipeBase::VPReplicateSC:
case VPRecipeBase::VPScalarIVStepsSC:
Expand Down Expand Up @@ -2171,17 +2172,27 @@ class VPReductionRecipe : public VPSingleDefRecipe {
/// The recurrence decriptor for the reduction in question.
const RecurrenceDescriptor &RdxDesc;
Mel-Chen marked this conversation as resolved.
Show resolved Hide resolved
bool IsOrdered;
/// Whether the reduction is conditional.
bool IsConditional = false;

protected:
VPReductionRecipe(const unsigned char SC, const RecurrenceDescriptor &R,
Instruction *I, ArrayRef<VPValue *> Operands,
VPValue *CondOp, bool IsOrdered)
: VPSingleDefRecipe(SC, Operands, I), RdxDesc(R), IsOrdered(IsOrdered) {
if (CondOp) {
IsConditional = true;
addOperand(CondOp);
}
}

public:
VPReductionRecipe(const RecurrenceDescriptor &R, Instruction *I,
VPValue *ChainOp, VPValue *VecOp, VPValue *CondOp,
bool IsOrdered)
: VPSingleDefRecipe(VPDef::VPReductionSC,
ArrayRef<VPValue *>({ChainOp, VecOp}), I),
RdxDesc(R), IsOrdered(IsOrdered) {
if (CondOp)
addOperand(CondOp);
}
: VPReductionRecipe(VPDef::VPReductionSC, R, I,
ArrayRef<VPValue *>({ChainOp, VecOp}), CondOp,
IsOrdered) {}

~VPReductionRecipe() override = default;

Expand All @@ -2190,7 +2201,15 @@ class VPReductionRecipe : public VPSingleDefRecipe {
getVecOp(), getCondOp(), IsOrdered);
}

VP_CLASSOF_IMPL(VPDef::VPReductionSC)
static inline bool classof(const VPRecipeBase *R) {
return R->getVPDefID() == VPRecipeBase::VPReductionSC ||
R->getVPDefID() == VPRecipeBase::VPReductionEVLSC;
}

static inline bool classof(const VPUser *U) {
auto *R = dyn_cast<VPRecipeBase>(U);
return R && classof(R);
}

/// Generate the reduction in the loop
void execute(VPTransformState &State) override;
Expand All @@ -2201,13 +2220,62 @@ class VPReductionRecipe : public VPSingleDefRecipe {
VPSlotTracker &SlotTracker) const override;
#endif

/// Return the recurrence decriptor for the in-loop reduction.
const RecurrenceDescriptor &getRecurrenceDescriptor() const {
return RdxDesc;
}
/// Return true if the in-loop reduction is ordered.
bool isOrdered() const { return IsOrdered; };
/// Return true if the in-loop reduction is conditional.
bool isConditional() const { return IsConditional; };
/// The VPValue of the scalar Chain being accumulated.
VPValue *getChainOp() const { return getOperand(0); }
/// The VPValue of the vector value to be reduced.
VPValue *getVecOp() const { return getOperand(1); }
/// The VPValue of the condition for the block.
VPValue *getCondOp() const {
return getNumOperands() > 2 ? getOperand(2) : nullptr;
return isConditional() ? getOperand(getNumOperands() - 1) : nullptr;
}
};

/// A recipe to represent inloop reduction operations with vector-predication
/// intrinsics, performing a reduction on a vector operand with the explicit
/// vector length (EVL) into a scalar value, and adding the result to a chain.
/// The Operands are {ChainOp, VecOp, EVL, [Condition]}.
class VPReductionEVLRecipe : public VPReductionRecipe {
public:
VPReductionEVLRecipe(VPReductionRecipe *R, VPValue *EVL, VPValue *CondOp)
: VPReductionRecipe(
VPDef::VPReductionEVLSC, R->getRecurrenceDescriptor(),
cast_or_null<Instruction>(R->getUnderlyingValue()),
ArrayRef<VPValue *>({R->getChainOp(), R->getVecOp(), EVL}), CondOp,
R->isOrdered()) {}

~VPReductionEVLRecipe() override = default;

VPReductionEVLRecipe *clone() override {
llvm_unreachable("cloning not implemented yet");
}

VP_CLASSOF_IMPL(VPDef::VPReductionEVLSC)

/// Generate the reduction in the loop
void execute(VPTransformState &State) override;

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;
#endif

/// The VPValue of the explicit vector length.
VPValue *getEVL() const { return getOperand(2); }

/// Returns true if the recipe only uses the first lane of operand \p Op.
bool onlyFirstLaneUsed(const VPValue *Op) const override {
assert(is_contained(operands(), Op) &&
"Op must be an operand of the recipe");
return Op == getEVL();
}
};

Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,9 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
[](const VPScalarCastRecipe *R) { return R->getResultType(); })
.Case<VPExpandSCEVRecipe>([](const VPExpandSCEVRecipe *R) {
return R->getSCEV()->getType();
})
.Case<VPReductionRecipe>([this](const auto *R) {
return inferScalarType(R->getChainOp());
});

assert(ResultTy && "could not infer type for the given VPValue");
Expand Down
Loading