-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LV] Move check if any vector insts will be generated to VPlan. #96622
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-systemz Author: Florian Hahn (fhahn) ChangesThis patch moves the check if any vector instructions will be generated from getInstructionCost to be based on VPlan. This simplifies getInstructionCost, is more accurate as we check the final result and also allows us to exit early once we visit a recipe that generates vector instructions. The helper can then be re-used by the VPlan-based cost model to match the legacy selectVectorizationFactor behavior, this fixing a crash and paving the way to recommit #92555. Full diff: https://github.com/llvm/llvm-project/pull/96622.diff 6 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7835836d21ef1..516709acbe704 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1648,12 +1648,7 @@ class LoopVectorizationCostModel {
/// Returns the execution time cost of an instruction for a given vector
/// width. Vector width of one means scalar.
- VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF);
-
- /// The cost-computation logic from getInstructionCost which provides
- /// the vector type as an output parameter.
- InstructionCost getInstructionCost(Instruction *I, ElementCount VF,
- Type *&VectorTy);
+ InstructionCost getInstructionCost(Instruction *I, ElementCount VF);
/// Return the cost of instructions in an inloop reduction pattern, if I is
/// part of that pattern.
@@ -4879,6 +4874,52 @@ static void emitInvalidCostRemarks(SmallVector<InstructionVFPair> InvalidCosts,
} while (!Tail.empty());
}
+static bool willGenerateVectorInstructions(VPlan &Plan, ElementCount VF,
+ const TargetTransformInfo &TTI) {
+ VPTypeAnalysis TypeInfo(Plan.getCanonicalIV()->getScalarType(),
+ Plan.getCanonicalIV()->getScalarType()->getContext());
+ for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+ vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) {
+ for (VPRecipeBase &R : *VPBB) {
+ if (isa<VPDerivedIVRecipe, VPScalarIVStepsRecipe, VPScalarCastRecipe,
+ VPReplicateRecipe, VPInstruction, VPActiveLaneMaskPHIRecipe,
+ VPCanonicalIVPHIRecipe, VPEVLBasedIVPHIRecipe,
+ VPVectorPointerRecipe>(&R))
+ continue;
+
+ auto WillWiden = [&TypeInfo, &TTI, VF](VPValue *VPV) {
+ Type *ScalarTy = TypeInfo.inferScalarType(VPV);
+ Type *VectorTy = ToVectorTy(ScalarTy, VF);
+ unsigned NumParts = TTI.getNumberOfParts(VectorTy);
+ if (!NumParts)
+ return false;
+ if (VF.isScalable())
+ // <vscale x 1 x iN> is assumed to be profitable over iN because
+ // scalable registers are a distinct register class from scalar ones.
+ // If we ever find a target which wants to lower scalable vectors
+ // back to scalars, we'll need to update this code to explicitly
+ // ask TTI about the register class uses for each part.
+ return NumParts <= VF.getKnownMinValue();
+ else
+ return NumParts < VF.getKnownMinValue();
+ };
+ SmallVector<VPValue *> VPValuesToCheck;
+ if (auto *WidenStore = dyn_cast<VPWidenStoreRecipe>(&R)) {
+ VPValuesToCheck.push_back(WidenStore->getOperand(1));
+ } else if (auto *IG = dyn_cast<VPInterleaveRecipe>(&R)) {
+ append_range(VPValuesToCheck, IG->getStoredValues());
+ } else {
+ append_range(VPValuesToCheck, R.definedValues());
+ }
+ if (any_of(VPValuesToCheck,
+ [&WillWiden](VPValue *VPV) { return WillWiden(VPV); }))
+ return true;
+ }
+ }
+
+ return false;
+}
+
VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
InstructionCost ExpectedCost =
CM.expectedCost(ElementCount::getFixed(1)).first;
@@ -4929,7 +4970,7 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
LLVM_DEBUG(dbgs() << ".\n");
#endif
- if (!C.second && !ForceVectorization) {
+ if (!willGenerateVectorInstructions(*P, VF, TTI) && !ForceVectorization) {
LLVM_DEBUG(
dbgs()
<< "LV: Not considering vector loop of width " << VF
@@ -5801,15 +5842,14 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
// Compute the cost of the vector instruction. Note that this cost already
// includes the scalarization overhead of the predicated instruction.
- InstructionCost VectorCost = getInstructionCost(I, VF).first;
+ InstructionCost VectorCost = getInstructionCost(I, VF);
// Compute the cost of the scalarized instruction. This cost is the cost of
// the instruction as if it wasn't if-converted and instead remained in the
// predicated block. We will scale this cost by block probability after
// computing the scalarization overhead.
InstructionCost ScalarCost =
- VF.getFixedValue() *
- getInstructionCost(I, ElementCount::getFixed(1)).first;
+ VF.getFixedValue() * getInstructionCost(I, ElementCount::getFixed(1));
// Compute the scalarization overhead of needed insertelement instructions
// and phi nodes.
@@ -5869,22 +5909,19 @@ LoopVectorizationCostModel::expectedCost(
(VF.isVector() && VecValuesToIgnore.count(&I)))
continue;
- VectorizationCostTy C = getInstructionCost(&I, VF);
+ InstructionCost C = getInstructionCost(&I, VF);
// Check if we should override the cost.
- if (C.first.isValid() &&
- ForceTargetInstructionCost.getNumOccurrences() > 0)
- C.first = InstructionCost(ForceTargetInstructionCost);
+ if (C.isValid() && ForceTargetInstructionCost.getNumOccurrences() > 0)
+ C = InstructionCost(ForceTargetInstructionCost);
// Keep a list of instructions with invalid costs.
- if (Invalid && !C.first.isValid())
+ if (Invalid && !C.isValid())
Invalid->emplace_back(&I, VF);
- BlockCost.first += C.first;
- BlockCost.second |= C.second;
- LLVM_DEBUG(dbgs() << "LV: Found an estimated cost of " << C.first
- << " for VF " << VF << " For instruction: " << I
- << '\n');
+ BlockCost.first += C;
+ LLVM_DEBUG(dbgs() << "LV: Found an estimated cost of " << C << " for VF "
+ << VF << " For instruction: " << I << '\n');
}
// If we are vectorizing a predicated block, it will have been
@@ -6297,49 +6334,6 @@ LoopVectorizationCostModel::getMemoryInstructionCost(Instruction *I,
return getWideningCost(I, VF);
}
-LoopVectorizationCostModel::VectorizationCostTy
-LoopVectorizationCostModel::getInstructionCost(Instruction *I,
- ElementCount VF) {
- // If we know that this instruction will remain uniform, check the cost of
- // the scalar version.
- if (isUniformAfterVectorization(I, VF))
- VF = ElementCount::getFixed(1);
-
- if (VF.isVector() && isProfitableToScalarize(I, VF))
- return VectorizationCostTy(InstsToScalarize[VF][I], false);
-
- // Forced scalars do not have any scalarization overhead.
- auto ForcedScalar = ForcedScalars.find(VF);
- if (VF.isVector() && ForcedScalar != ForcedScalars.end()) {
- auto InstSet = ForcedScalar->second;
- if (InstSet.count(I))
- return VectorizationCostTy(
- (getInstructionCost(I, ElementCount::getFixed(1)).first *
- VF.getKnownMinValue()),
- false);
- }
-
- Type *VectorTy;
- InstructionCost C = getInstructionCost(I, VF, VectorTy);
-
- bool TypeNotScalarized = false;
- if (VF.isVector() && VectorTy->isVectorTy()) {
- if (unsigned NumParts = TTI.getNumberOfParts(VectorTy)) {
- if (VF.isScalable())
- // <vscale x 1 x iN> is assumed to be profitable over iN because
- // scalable registers are a distinct register class from scalar ones.
- // If we ever find a target which wants to lower scalable vectors
- // back to scalars, we'll need to update this code to explicitly
- // ask TTI about the register class uses for each part.
- TypeNotScalarized = NumParts <= VF.getKnownMinValue();
- else
- TypeNotScalarized = NumParts < VF.getKnownMinValue();
- } else
- C = InstructionCost::getInvalid();
- }
- return VectorizationCostTy(C, TypeNotScalarized);
-}
-
InstructionCost LoopVectorizationCostModel::getScalarizationOverhead(
Instruction *I, ElementCount VF, TTI::TargetCostKind CostKind) const {
@@ -6730,8 +6724,25 @@ void LoopVectorizationCostModel::setVectorizedCallDecision(ElementCount VF) {
}
InstructionCost
-LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
- Type *&VectorTy) {
+LoopVectorizationCostModel::getInstructionCost(Instruction *I,
+ ElementCount VF) {
+ // If we know that this instruction will remain uniform, check the cost of
+ // the scalar version.
+ if (isUniformAfterVectorization(I, VF))
+ VF = ElementCount::getFixed(1);
+
+ if (VF.isVector() && isProfitableToScalarize(I, VF))
+ return InstsToScalarize[VF][I];
+
+ // Forced scalars do not have any scalarization overhead.
+ auto ForcedScalar = ForcedScalars.find(VF);
+ if (VF.isVector() && ForcedScalar != ForcedScalars.end()) {
+ auto InstSet = ForcedScalar->second;
+ if (InstSet.count(I))
+ return getInstructionCost(I, ElementCount::getFixed(1)) *
+ VF.getKnownMinValue();
+ }
+
Type *RetTy = I->getType();
if (canTruncateToMinimalBitwidth(I, VF))
RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);
@@ -6754,6 +6765,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
};
(void) hasSingleCopyAfterVectorization;
+ Type *VectorTy;
if (isScalarAfterVectorization(I, VF)) {
// With the exception of GEPs and PHIs, after scalarization there should
// only be one copy of the instruction generated in the loop. This is
@@ -6769,6 +6781,10 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
} else
VectorTy = ToVectorTy(RetTy, VF);
+ if (VF.isVector() && VectorTy->isVectorTy() &&
+ !TTI.getNumberOfParts(VectorTy))
+ return InstructionCost::getInvalid();
+
// TODO: We need to estimate the cost of intrinsic calls.
switch (I->getOpcode()) {
case Instruction::GetElementPtr:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index eca5d1d4c5e1d..20dc53e094a74 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -68,6 +68,9 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
case VPInstruction::PtrAdd:
// Return the type based on the pointer argument (i.e. first operand).
return inferScalarType(R->getOperand(0));
+ case VPInstruction::BranchOnCond:
+ case VPInstruction::BranchOnCount:
+ return Type::getVoidTy(Ctx);
default:
break;
}
@@ -248,8 +251,9 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
})
.Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe>(
[](const auto *R) { return R->getScalarType(); })
- .Case<VPPredInstPHIRecipe, VPWidenPHIRecipe, VPScalarIVStepsRecipe,
- VPWidenGEPRecipe>([this](const VPRecipeBase *R) {
+ .Case<VPReductionRecipe, VPPredInstPHIRecipe, VPWidenPHIRecipe,
+ VPScalarIVStepsRecipe, VPWidenGEPRecipe, VPVectorPointerRecipe,
+ VPWidenCanonicalIVRecipe>([this](const VPRecipeBase *R) {
return inferScalarType(R->getOperand(0));
})
.Case<VPBlendRecipe, VPInstruction, VPWidenRecipe, VPReplicateRecipe,
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/zero_unroll.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/zero_unroll.ll
index d8535d510ca03..27efd9fade741 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/zero_unroll.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/zero_unroll.ll
@@ -1,7 +1,7 @@
; RUN: opt -S -passes=loop-vectorize -mtriple=s390x-linux-gnu -vectorizer-min-trip-count=8 < %s | FileCheck %s
define i32 @main(i32 %arg, ptr nocapture readnone %arg1) #0 {
-;CHECK: vector.body:
+; CHECK-NOT: vector.body:
entry:
%0 = alloca i8, align 1
br label %loop
diff --git a/llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll b/llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll
index d22ccf6671e84..92b2410d7853a 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll
@@ -622,38 +622,15 @@ define void @wide_iv_trunc_reuse(ptr %dst) {
; CHECK-LABEL: define void @wide_iv_trunc_reuse(
; CHECK-SAME: ptr [[DST:%.*]]) {
; CHECK-NEXT: entry:
-; CHECK-NEXT: br i1 true, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
-; CHECK: vector.ph:
-; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
-; CHECK: vector.body:
-; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT: [[OFFSET_IDX:%.*]] = trunc i64 [[INDEX]] to i32
-; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[OFFSET_IDX]], 0
-; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[OFFSET_IDX]], 1
-; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[OFFSET_IDX]], 2
-; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[OFFSET_IDX]], 3
-; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[OFFSET_IDX]], 4
-; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[OFFSET_IDX]], 5
-; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[OFFSET_IDX]], 6
-; CHECK-NEXT: [[TMP7:%.*]] = add i32 [[OFFSET_IDX]], 7
-; CHECK-NEXT: store i32 [[TMP7]], ptr [[DST]], align 4
-; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
-; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
-; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]
-; CHECK: middle.block:
-; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
-; CHECK: scalar.ph:
-; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 1, [[MIDDLE_BLOCK]] ], [ 1, [[ENTRY:%.*]] ]
-; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i32 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
-; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT: [[IV_2:%.*]] = phi i32 [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ], [ [[IV_TRUNC:%.*]], [[LOOP]] ]
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 1, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT: [[IV_2:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_TRUNC:%.*]], [[LOOP]] ]
; CHECK-NEXT: store i32 [[IV_2]], ptr [[DST]], align 4
; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
; CHECK-NEXT: [[EC:%.*]] = icmp eq i64 [[IV]], 0
; CHECK-NEXT: [[IV_TRUNC]] = trunc i64 [[IV]] to i32
-; CHECK-NEXT: br i1 [[EC]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP27:![0-9]+]]
+; CHECK-NEXT: br i1 [[EC]], label [[EXIT:%.*]], label [[LOOP]]
; CHECK: exit:
; CHECK-NEXT: ret void
;
@@ -701,6 +678,4 @@ attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" }
; CHECK: [[LOOP23]] = distinct !{[[LOOP23]], [[META2]], [[META1]]}
; CHECK: [[LOOP24]] = distinct !{[[LOOP24]], [[META1]], [[META2]]}
; CHECK: [[LOOP25]] = distinct !{[[LOOP25]], [[META2]], [[META1]]}
-; CHECK: [[LOOP26]] = distinct !{[[LOOP26]], [[META1]], [[META2]]}
-; CHECK: [[LOOP27]] = distinct !{[[LOOP27]], [[META2]], [[META1]]}
;.
diff --git a/llvm/test/Transforms/LoopVectorize/pr32859.ll b/llvm/test/Transforms/LoopVectorize/pr32859.ll
index 24e713a7f2cff..a29a6bd735feb 100644
--- a/llvm/test/Transforms/LoopVectorize/pr32859.ll
+++ b/llvm/test/Transforms/LoopVectorize/pr32859.ll
@@ -1,4 +1,4 @@
-; RUN: opt < %s -passes=loop-vectorize -S | FileCheck %s
+; RUN: opt < %s -passes=loop-vectorize -force-vector-width=4 -S | FileCheck %s
; Out of the LCSSA form we could have 'phi i32 [ loop-invariant, %for.inc.2.i ]'
; but the IR Verifier requires for PHI one entry for each predecessor of
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-incomplete-cases.ll b/llvm/test/Transforms/LoopVectorize/vplan-incomplete-cases.ll
index 2007155fe5485..ef3b98d73a87f 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-incomplete-cases.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-incomplete-cases.ll
@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
-; RUN: opt -passes=loop-vectorize -S %s | FileCheck %s
+; RUN: opt -passes=loop-vectorize -force-vector-width=2 -S %s | FileCheck %s
; This test used to crash due to missing Or/Not cases in inferScalarTypeForRecipe.
define void @vplan_incomplete_cases_tc2(i8 %x, i8 %y) {
@@ -65,8 +65,9 @@ define void @vplan_incomplete_cases_tc3(i8 %x, i8 %y) {
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
-; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
-; CHECK-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
+; CHECK-NEXT: [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4
+; CHECK-NEXT: br i1 [[TMP0]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
; CHECK: [[SCALAR_PH]]:
|
ping :) |
/// the vector type as an output parameter. | ||
InstructionCost getInstructionCost(Instruction *I, ElementCount VF, | ||
Type *&VectorTy); | ||
InstructionCost getInstructionCost(Instruction *I, ElementCount VF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could VectorizationCostTy
be retired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
@@ -4929,7 +4970,7 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() { | |||
LLVM_DEBUG(dbgs() << ".\n"); | |||
#endif | |||
|
|||
if (!C.second && !ForceVectorization) { | |||
if (!willGenerateVectorInstructions(*P, VF, TTI) && !ForceVectorization) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!willGenerateVectorInstructions(*P, VF, TTI) && !ForceVectorization) { | |
if (!ForceVectorization && !willGenerateVectorInstructions(*P, VF, TTI)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reordered, thanks!
VPTypeAnalysis TypeInfo(Plan.getCanonicalIV()->getScalarType(), | ||
Plan.getCanonicalIV()->getScalarType()->getContext()); | ||
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>( | ||
vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suffice to traverse shallow - inner regions replicate scalar values only - stopping at vector loop region's exit block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now yes I think, as replicate regions won't contain any widen recipes.
continue; | ||
|
||
auto WillWiden = [&TypeInfo, &TTI, VF](VPValue *VPV) { | ||
Type *ScalarTy = TypeInfo.inferScalarType(VPV); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time to introduce inferVectorType()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a switch to make sure all recipes are handled; as mentioned above, we need to skip some widen recipes if we want to preserve the existing behavior initially.
auto WillWiden = [&TypeInfo, &TTI, VF](VPValue *VPV) { | ||
Type *ScalarTy = TypeInfo.inferScalarType(VPV); | ||
Type *VectorTy = ToVectorTy(ScalarTy, VF); | ||
unsigned NumParts = TTI.getNumberOfParts(VectorTy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumLegalParts or NumRegisters to avoid confusion with unroll parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
LLVM_DEBUG(dbgs() << "LV: Found an estimated cost of " << C.first | ||
<< " for VF " << VF << " For instruction: " << I | ||
<< '\n'); | ||
BlockCost.first += C; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should BlockCost also become InstructionCost rather than VectorizationCostTy, as its second
seems to become undefined and/or unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated throughout, thanks!
@@ -1,4 +1,4 @@ | |||
; RUN: opt < %s -passes=loop-vectorize -S | FileCheck %s | |||
; RUN: opt < %s -passes=loop-vectorize -force-vector-width=4 -S | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this forcing of VF needed, here and below - default cost- based decision changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't generate any vector instructions due to dead instructions. I think in this case it makes sense to retain the vectorization behavior by forcing the VF, as from the comment the goal is to guard against creating invalid phis outside the vector loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed good to retain behavior. Is the not-generating-any-vectors-due-to-dead-instructions intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense, as there won't be any code at all generated for dead instructions that have been cleaned up by DCE
; CHECK-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]] | ||
; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2 | ||
; CHECK-NEXT: [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4 | ||
; CHECK-NEXT: br i1 [[TMP0]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've undone the change, this now won't get vectorized due to the vector loop body being empty (except canonical IV, increment, compare and branch). It still serves its original purpose (checking type inference for logical AND VPInst)
auto ForcedScalar = ForcedScalars.find(VF); | ||
if (VF.isVector() && ForcedScalar != ForcedScalars.end()) { | ||
auto InstSet = ForcedScalar->second; | ||
if (InstSet.count(I)) | ||
return getInstructionCost(I, ElementCount::getFixed(1)) * | ||
VF.getKnownMinValue(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent of this patch: if ForcedScalars records its cost, as in InstsToScalarize and WideningDecisions, then ForcedScalars[VF][I] can be returned if exists, similar to InstsToScalarize, rather than invoking a recursive call. Could ForcedScalars be folded into InstsToScalarize.
@@ -1,7 +1,7 @@ | |||
; RUN: opt -S -passes=loop-vectorize -mtriple=s390x-linux-gnu -vectorizer-min-trip-count=8 < %s | FileCheck %s | |||
|
|||
define i32 @main(i32 %arg, ptr nocapture readnone %arg1) #0 { | |||
;CHECK: vector.body: | |||
; CHECK-NOT: vector.body: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more accurate observation if any instruction was vectorized causes LV to avoid (pretending to) vectorizing this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, generated full check lines for the file in 959ff45; there are no vector instructions generated, because the induction is scalarizied, which isn't modeled in the legacy cost model (a similar case is in llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll
)
This patch moves the check if any vector instructions will be generated from getInstructionCost to be based on VPlan. This simplifies getInstructionCost, is more accurate as we check the final result and also allows us to exit early once we visit a recipe that generates vector instructions. The helper can then be re-used by the VPlan-based cost model to match the legacy selectVectorizationFactor behavior, this fixing a crash and paving the way to recommit llvm#92555.
14d39c0
to
0789f5d
Compare
Regenerate test checks to better show impact of #96622.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should be addressed now, thanks!
/// the vector type as an output parameter. | ||
InstructionCost getInstructionCost(Instruction *I, ElementCount VF, | ||
Type *&VectorTy); | ||
InstructionCost getInstructionCost(Instruction *I, ElementCount VF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
@@ -4879,6 +4874,52 @@ static void emitInvalidCostRemarks(SmallVector<InstructionVFPair> InvalidCosts, | |||
} while (!Tail.empty()); | |||
} | |||
|
|||
static bool willGenerateVectorInstructions(VPlan &Plan, ElementCount VF, | |||
const TargetTransformInfo &TTI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
VPTypeAnalysis TypeInfo(Plan.getCanonicalIV()->getScalarType(), | ||
Plan.getCanonicalIV()->getScalarType()->getContext()); | ||
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>( | ||
vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now yes I think, as replicate regions won't contain any widen recipes.
if (isa<VPDerivedIVRecipe, VPScalarIVStepsRecipe, VPScalarCastRecipe, | ||
VPReplicateRecipe, VPInstruction, VPActiveLaneMaskPHIRecipe, | ||
VPCanonicalIVPHIRecipe, VPEVLBasedIVPHIRecipe, | ||
VPVectorPointerRecipe>(&R)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed VPActiveLaneMaskPHIRecipe
, retained VPInstruction
to retain existing behavior. Added a comment. (VPVectorPointerRecipe
is admittedly named a bit confusingly, it won't generate a vector of pointers, but a scalar pointer to be used for a wide memory access)
continue; | ||
|
||
auto WillWiden = [&TypeInfo, &TTI, VF](VPValue *VPV) { | ||
Type *ScalarTy = TypeInfo.inferScalarType(VPV); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a switch to make sure all recipes are handled; as mentioned above, we need to skip some widen recipes if we want to preserve the existing behavior initially.
@@ -4929,7 +4970,7 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() { | |||
LLVM_DEBUG(dbgs() << ".\n"); | |||
#endif | |||
|
|||
if (!C.second && !ForceVectorization) { | |||
if (!willGenerateVectorInstructions(*P, VF, TTI) && !ForceVectorization) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reordered, thanks!
LLVM_DEBUG(dbgs() << "LV: Found an estimated cost of " << C.first | ||
<< " for VF " << VF << " For instruction: " << I | ||
<< '\n'); | ||
BlockCost.first += C; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated throughout, thanks!
@@ -1,7 +1,7 @@ | |||
; RUN: opt -S -passes=loop-vectorize -mtriple=s390x-linux-gnu -vectorizer-min-trip-count=8 < %s | FileCheck %s | |||
|
|||
define i32 @main(i32 %arg, ptr nocapture readnone %arg1) #0 { | |||
;CHECK: vector.body: | |||
; CHECK-NOT: vector.body: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, generated full check lines for the file in 959ff45; there are no vector instructions generated, because the induction is scalarizied, which isn't modeled in the legacy cost model (a similar case is in llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll
)
@@ -1,4 +1,4 @@ | |||
; RUN: opt < %s -passes=loop-vectorize -S | FileCheck %s | |||
; RUN: opt < %s -passes=loop-vectorize -force-vector-width=4 -S | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't generate any vector instructions due to dead instructions. I think in this case it makes sense to retain the vectorization behavior by forcing the VF, as from the comment the goal is to guard against creating invalid phis outside the vector loop.
; CHECK-NEXT: br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]] | ||
; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2 | ||
; CHECK-NEXT: [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4 | ||
; CHECK-NEXT: br i1 [[TMP0]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've undone the change, this now won't get vectorized due to the vector loop body being empty (except canonical IV, increment, compare and branch). It still serves its original purpose (checking type inference for logical AND VPInst)
Regenerate test checks to better show impact of llvm#96622.
@@ -4795,9 +4783,92 @@ static void emitInvalidCostRemarks(SmallVector<InstructionVFPair> InvalidCosts, | |||
} while (!Tail.empty()); | |||
} | |||
|
|||
static bool willGenerateVectorInstructions(VPlan &Plan, ElementCount VF, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static bool willGenerateVectorInstructions(VPlan &Plan, ElementCount VF, | |
/// Check if any recipe of \p Plan will generate a vector value, which will be assigned a vector register. | |
static bool willGenerateVectorRegisters(VPlan &Plan, ElementCount VF, |
or willGenerateVectors
- this not only looks for recipes that will widen, producing vector IR instructions, but also that the vector values they produce will end up utilizing vector registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
} | ||
|
||
auto WillWiden = [&TypeInfo, &TTI, VF](VPValue *VPV) { | ||
Type *ScalarTy = TypeInfo.inferScalarType(VPV); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth caching ScalarTy's already checked?
WillWiden is essentially a function of ScalarTy rather than of VPV, feed it VPV's inferred scalar type instead of the two ingredients to produce it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
if (auto *WidenStore = dyn_cast<VPWidenStoreRecipe>(&R)) { | ||
VPValuesToCheck.push_back(WidenStore->getOperand(1)); | ||
} else if (auto *IG = dyn_cast<VPInterleaveRecipe>(&R)) { | ||
append_range(VPValuesToCheck, IG->getStoredValues()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This misses values defined by interleaved loads, for which getStoredValues() returns none. That is currently the only case having multiple def's, which are all of the same type - so suffice to check only one.
Another reminder to split VPInterleaveRecipe into interleaved loads and interleaved stores.
Another observation is that VPBranchOnMask should be a VPInstruction, producing no value - similar to BranchOnCond and BranchOnCount VPInstructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by always adding defined values, only add first stored value, thanks!
return NumLegalParts < VF.getKnownMinValue(); | ||
}; | ||
SmallVector<VPValue *> VPValuesToCheck; | ||
if (auto *WidenStore = dyn_cast<VPWidenStoreRecipe>(&R)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or VPWidenStoreEVLRecipe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, it is not yet a subclass of VPStoreRecipe.... Updated, thanks!
}; | ||
SmallVector<VPValue *> VPValuesToCheck; | ||
if (auto *WidenStore = dyn_cast<VPWidenStoreRecipe>(&R)) { | ||
VPValuesToCheck.push_back(WidenStore->getOperand(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPValuesToCheck.push_back(WidenStore->getOperand(1)); | |
VPValuesToCheck.push_back(WidenStore->getStoredValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
// Two or more parts that share a register - are vectorized. | ||
return NumLegalParts < VF.getKnownMinValue(); | ||
}; | ||
SmallVector<VPValue *> VPValuesToCheck; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way, say, could be:
SmallVector<VPValue *> VPValuesToCheck; | |
// For multi-def recipes, currently only interleaved loads, suffice to check first def only. | |
if (R.getNumDefinedValues() >= 1 && WillWiden(R.getVPValue(0))) | |
return true; | |
// For stores check their stored value; for interleaved stores, suffice the check first stored value only. In all cases this is the second operand. | |
if (isa<VPWidenStoreRecipe, VPWidenStoreEVLRecipe, VPInterleaveRecipe>(&R) && | |
WillWiden(R.getOperand(1))) | |
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified, thanks!
@@ -1,4 +1,4 @@ | |||
; RUN: opt < %s -passes=loop-vectorize -S | FileCheck %s | |||
; RUN: opt < %s -passes=loop-vectorize -force-vector-width=4 -S | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed good to retain behavior. Is the not-generating-any-vectors-due-to-dead-instructions intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should be addressed, thanks!
@@ -1,4 +1,4 @@ | |||
; RUN: opt < %s -passes=loop-vectorize -S | FileCheck %s | |||
; RUN: opt < %s -passes=loop-vectorize -force-vector-width=4 -S | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense, as there won't be any code at all generated for dead instructions that have been cleaned up by DCE
// result. Note that this includes VPInstruction, where some opcodes may | ||
// produce a vector to preserve existing behavior originally as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// result. Note that this includes VPInstruction, where some opcodes may | |
// produce a vector to preserve existing behavior originally as | |
// result. Note that this includes VPInstruction where some opcodes may | |
// produce a vector, to preserve existing behavior as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
assert(VF.isVector() && "Checking a scalar VF?"); | ||
VPTypeAnalysis TypeInfo(Plan.getCanonicalIV()->getScalarType(), | ||
Plan.getCanonicalIV()->getScalarType()->getContext()); | ||
DenseMap<Type *, bool> GeneratesVector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suffice to hold a set of all visited scalars instead? Only scalars are cached possibly ending with the first vector found - whose insertion is redundant as it early exits the search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
auto WillWiden = [&TypeInfo, &TTI, &GeneratesVector, VF](VPValue *VPV) { | ||
Type *ScalarTy = TypeInfo.inferScalarType(VPV); | ||
const auto &[Iter, Ins] = GeneratesVector.insert({ScalarTy, false}); | ||
if (Ins) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early return if in cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
if (R.getNumDefinedValues() >= 1) { | ||
// For multi-def recipes, currently only interleaved loads, suffice to | ||
// check first def only. | ||
if (WillWiden(R.getVPValue(0))) | ||
return true; | ||
} else if (isa<VPWidenStoreRecipe, VPWidenStoreEVLRecipe, | ||
VPInterleaveRecipe>(&R) && | ||
WillWiden(R.getOperand(1))) { | ||
// For stores check their stored value; for interleaved stores, suffice | ||
// the check first stored value only. In all cases this is the second | ||
// operand. | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to see if WillWiden() can be simplified to only check for widening - how about the following (may be too compact, chose which style you prefer):
if (R.getNumDefinedValues() >= 1) { | |
// For multi-def recipes, currently only interleaved loads, suffice to | |
// check first def only. | |
if (WillWiden(R.getVPValue(0))) | |
return true; | |
} else if (isa<VPWidenStoreRecipe, VPWidenStoreEVLRecipe, | |
VPInterleaveRecipe>(&R) && | |
WillWiden(R.getOperand(1))) { | |
// For stores check their stored value; for interleaved stores, suffice | |
// the check first stored value only. In all cases this is the second | |
// operand. | |
return true; | |
} | |
// If no def nor is a store, e.g., branches, continue - no value to check. | |
if (R.getNumDefinedValues() == 0 && | |
!isa<VPWidenStoreRecipe, VPWidenStoreEVLRecipe, VPInterleaveRecipe>(&R)) | |
continue; | |
// For multi-def recipes, currently only interleaved loads, suffice to | |
// check first def only. | |
// For stores check their stored value; for interleaved stores suffice | |
// the check first stored value only. In all cases this is the second | |
// operand. | |
VPValue *ToCheck = R.getNumDefinedValues() >= 1 ? R.getVPValue(0) : R.getOperand(1); | |
Type *ScalarTy = TypeInfo.inferScalarType(ToCheck); | |
if (WillNotWiden.contains(ScalarTy)) | |
continue; | |
if (WillWiden(ScalarTy)) | |
return true; | |
WillNotWiden.insert(ScalarTy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
if (Ins) { | ||
Type *VectorTy = ToVectorTy(ScalarTy, VF); | ||
unsigned NumLegalParts = TTI.getNumberOfParts(VectorTy); | ||
if (!NumLegalParts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be cached, but in the latest version, we only visit each scalar type at most once. Left as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thanks for accommodating! Adding last couple of minor comments.
// Set of types known to not generate vector values. | ||
DenseSet<Type *> WillNotWiden; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Set of types known to not generate vector values. | |
DenseSet<Type *> WillNotWiden; | |
// Set of types already visited. | |
DenseSet<Type *> Visited; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed, thanks!
VPValue *ToCheck = | ||
R.getNumDefinedValues() >= 1 ? R.getVPValue(0) : R.getOperand(1); | ||
Type *ScalarTy = TypeInfo.inferScalarType(ToCheck); | ||
if (!WillNotWiden.insert({ScalarTy}).second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inserting before checking will include will-not-widen types along with the first will-widen type (if one exists), so slightly more accurate to call the cache Visited
than WillNotWiden
:
if (!WillNotWiden.insert({ScalarTy}).second) | |
if (!Visited.insert({ScalarTy}).second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed, thanks!
// check first def only. | ||
// For stores check their stored value; for interleaved stores suffice | ||
// the check first stored value only. In all cases this is the second | ||
// operand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the stored widened values are presumably defined earlier, inside the vector loop, so would it suffice to check the types of widened values defined by widening recipes only, and ignore checking those used by stores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would potentially miss cases where we have wide stores of loop-invariant values only, left as is for now.!
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/42/builds/280 Here is the relevant piece of the build log for the reference:
|
While testing the VPlan cost model changes, I found a case where we vectorized with this change but not without it. This was due to the VPlan based analysis not skipping ephemeral values (such as values only used by asserts), leading to vectorization when the only widened values are used by asserts only. Should be fixed by ef89e3e |
This reverts commit 6f538f6. A number of crashes have been fixed by separate fixes, including ttps://github.com//pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555
This reverts commit 6f538f6. A number of crashes have been fixed by separate fixes, including ttps://github.com/llvm/pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555
This patch moves the check if any vector instructions will be generated from getInstructionCost to be based on VPlan. This simplifies getInstructionCost, is more accurate as we check the final result and also allows us to exit early once we visit a recipe that generates vector instructions.
The helper can then be re-used by the VPlan-based cost model to match the legacy selectVectorizationFactor behavior, this fixing a crash and paving the way to recommit #92555.