-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Port invalid cost remarks to VPlan. #99322
Conversation
This patch moves the logic to create remarks for instructions with invalid costs to work on recipes and decoupling it from selectVectorizationFactor. This is needed to replace the remaining uses of selectVectorizationFactor with getBestPlan using the VPlan-based cost model. The current implementation iterates over all VPlans and their recipes again, to find recipes with invalid costs, which is more work but will only be done when remarks for LV are enabled. Once the remaining uses of selectVectorizationFactor are retired, we can collect VPlans with invalid costs as part of getBestPlan if we want to optimize the remarks case a bit, at the cost of adding additional complexity.
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesThis patch moves the logic to create remarks for instructions with invalid costs to work on recipes and decoupling it from selectVectorizationFactor. This is needed to replace the remaining uses of selectVectorizationFactor with getBestPlan using the VPlan-based cost model. The current implementation iterates over all VPlans and their recipes again, to find recipes with invalid costs, which is more work but will only be done when remarks for LV are enabled. Once the remaining uses of selectVectorizationFactor are retired, we can collect VPlans with invalid costs as part of getBestPlan if we want to optimize the remarks case a bit, at the cost of adding additional complexity. Full diff: https://github.com/llvm/llvm-project/pull/99322.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index c63cf0c37f2f9..ac34fa4ba2777 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -411,6 +411,9 @@ class LoopVectorizationPlanner {
VectorizationFactor
selectEpilogueVectorizationFactor(const ElementCount MaxVF, unsigned IC);
+ /// Emit remarks for recipes with invalid costs in the available VPlans.
+ void emitInvalidCostRemarks(OptimizationRemarkEmitter *ORE);
+
protected:
/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,
/// according to the information gathered by Legal when it checked if it is
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5fc365f77efbb..a55e06594a394 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -75,6 +75,7 @@
#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"
+#include "llvm/ADT/TypeSwitch.h"
#include "llvm/ADT/iterator_range.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"
@@ -936,10 +937,12 @@ static void debugVectorizationMessage(const StringRef Prefix,
/// instruction that prevents vectorization. Otherwise \p TheLoop is used for
/// the location of the remark. \return the remark object that can be
/// streamed to.
-static OptimizationRemarkAnalysis createLVAnalysis(const char *PassName,
- StringRef RemarkName, Loop *TheLoop, Instruction *I) {
+static OptimizationRemarkAnalysis
+createLVAnalysis(const char *PassName, StringRef RemarkName, Loop *TheLoop,
+ Instruction *I, DebugLoc DL = {}) {
Value *CodeRegion = TheLoop->getHeader();
- DebugLoc DL = TheLoop->getStartLoc();
+ if (!DL)
+ DL = TheLoop->getStartLoc();
if (I) {
CodeRegion = I->getParent();
@@ -990,13 +993,14 @@ void reportVectorizationFailure(const StringRef DebugMsg,
/// as an optimization remark. Uses either \p I as location of the remark, or
/// otherwise \p TheLoop.
static void reportVectorizationInfo(const StringRef Msg, const StringRef ORETag,
- OptimizationRemarkEmitter *ORE, Loop *TheLoop,
- Instruction *I = nullptr) {
+ OptimizationRemarkEmitter *ORE,
+ Loop *TheLoop, Instruction *I = nullptr,
+ DebugLoc DL = {}) {
LLVM_DEBUG(debugVectorizationMessage("", Msg, I));
LoopVectorizeHints Hints(TheLoop, true /* doesn't matter */, *ORE);
- ORE->emit(
- createLVAnalysis(Hints.vectorizeAnalysisPassName(), ORETag, TheLoop, I)
- << Msg);
+ ORE->emit(createLVAnalysis(Hints.vectorizeAnalysisPassName(), ORETag, TheLoop,
+ I, DL)
+ << Msg);
}
/// Report successful vectorization of the loop. In case an outer loop is
@@ -1586,9 +1590,7 @@ class LoopVectorizationCostModel {
/// the factor width. If \p Invalid is not nullptr, this function
/// will add a pair(Instruction*, ElementCount) to \p Invalid for
/// each instruction that has an Invalid cost for the given VF.
- InstructionCost
- expectedCost(ElementCount VF,
- SmallVectorImpl<InstructionVFPair> *Invalid = nullptr);
+ InstructionCost expectedCost(ElementCount VF);
bool hasPredStores() const { return NumPredStores > 0; }
@@ -4694,9 +4696,28 @@ bool LoopVectorizationPlanner::isMoreProfitable(
return CmpFn(RTCostA, RTCostB);
}
-static void emitInvalidCostRemarks(SmallVector<InstructionVFPair> InvalidCosts,
- OptimizationRemarkEmitter *ORE,
- Loop *TheLoop) {
+void LoopVectorizationPlanner::emitInvalidCostRemarks(
+ OptimizationRemarkEmitter *ORE) {
+ if (VPlans.empty())
+ return;
+
+ using RecipeVFPair = std::pair<VPRecipeBase *, ElementCount>;
+ SmallVector<RecipeVFPair> InvalidCosts;
+ for (const auto &Plan : VPlans) {
+ for (ElementCount VF : Plan->vectorFactors()) {
+ LLVMContext &LLVMCtx = OrigLoop->getHeader()->getContext();
+ VPCostContext CostCtx(CM.TTI, Legal->getWidestInductionType(), LLVMCtx,
+ CM);
+ auto Iter = vp_depth_first_deep(Plan->getVectorLoopRegion()->getEntry());
+ for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
+ for (auto &R : *VPBB) {
+ if (R.cost(VF, CostCtx).isValid())
+ continue;
+ InvalidCosts.emplace_back(&R, VF);
+ }
+ }
+ }
+ }
if (InvalidCosts.empty())
return;
@@ -4704,14 +4725,14 @@ static void emitInvalidCostRemarks(SmallVector<InstructionVFPair> InvalidCosts,
// Group the remarks per instruction, keeping the instruction order from
// InvalidCosts.
- std::map<Instruction *, unsigned> Numbering;
+ std::map<VPRecipeBase *, unsigned> Numbering;
unsigned I = 0;
for (auto &Pair : InvalidCosts)
if (!Numbering.count(Pair.first))
Numbering[Pair.first] = I++;
// Sort the list, first on instruction(number) then on VF.
- sort(InvalidCosts, [&Numbering](InstructionVFPair &A, InstructionVFPair &B) {
+ sort(InvalidCosts, [&Numbering](RecipeVFPair &A, RecipeVFPair &B) {
if (Numbering[A.first] != Numbering[B.first])
return Numbering[A.first] < Numbering[B.first];
const auto &LHS = A.second;
@@ -4725,33 +4746,57 @@ static void emitInvalidCostRemarks(SmallVector<InstructionVFPair> InvalidCosts,
// Group the instructions together to emit separate remarks for:
// load (vf1, vf2)
// store (vf1)
- auto Tail = ArrayRef<InstructionVFPair>(InvalidCosts);
- auto Subset = ArrayRef<InstructionVFPair>();
+ auto Tail = ArrayRef<RecipeVFPair>(InvalidCosts);
+ auto Subset = ArrayRef<RecipeVFPair>();
do {
if (Subset.empty())
Subset = Tail.take_front(1);
- Instruction *I = Subset.front().first;
+ VPRecipeBase *R = Subset.front().first;
+
+ unsigned Opcode =
+ TypeSwitch<const VPRecipeBase *, unsigned>(R)
+ .Case<VPHeaderPHIRecipe>(
+ [](const auto *R) { return Instruction::PHI; })
+ .Case<VPWidenSelectRecipe>(
+ [](const auto *R) { return Instruction::Select; })
+ .Case<VPWidenStoreRecipe>(
+ [](const auto *R) { return Instruction::Store; })
+ .Case<VPWidenLoadRecipe>(
+ [](const auto *R) { return Instruction::Load; })
+ .Case<VPWidenCallRecipe>(
+ [](const auto *R) { return Instruction::Call; })
+ .Case<VPInstruction, VPWidenRecipe, VPReplicateRecipe,
+ VPWidenCastRecipe>(
+ [](const auto *R) { return R->getOpcode(); })
+ .Case<VPInterleaveRecipe>([](const VPInterleaveRecipe *R) {
+ return R->getStoredValues().empty() ? Instruction::Load
+ : Instruction::Store;
+ });
// If the next instruction is different, or if there are no other pairs,
// emit a remark for the collated subset. e.g.
// [(load, vf1), (load, vf2))]
// to emit:
// remark: invalid costs for 'load' at VF=(vf, vf2)
- if (Subset == Tail || Tail[Subset.size()].first != I) {
+ if (Subset == Tail || Tail[Subset.size()].first != R) {
std::string OutString;
raw_string_ostream OS(OutString);
assert(!Subset.empty() && "Unexpected empty range");
- OS << "Instruction with invalid costs prevented vectorization at VF=(";
+ OS << "Recipe with invalid costs prevented vectorization at VF=(";
for (const auto &Pair : Subset)
OS << (Pair.second == Subset.front().second ? "" : ", ") << Pair.second;
OS << "):";
- if (auto *CI = dyn_cast<CallInst>(I))
- OS << " call to " << CI->getCalledFunction()->getName();
+ if (Opcode == Instruction::Call)
+ OS << " call to "
+ << R->getOperand(R->getNumOperands() - 1)
+ ->getLiveInIRValue()
+ ->getName();
else
- OS << " " << I->getOpcodeName();
+ OS << " " << Instruction::getOpcodeName(Opcode);
OS.flush();
- reportVectorizationInfo(OutString, "InvalidCost", ORE, TheLoop, I);
+ reportVectorizationInfo(OutString, "InvalidCost", ORE, OrigLoop, nullptr,
+ R->getDebugLoc());
Tail = Tail.drop_front(Subset.size());
Subset = {};
} else
@@ -4880,14 +4925,13 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
ChosenFactor.Cost = InstructionCost::getMax();
}
- SmallVector<InstructionVFPair> InvalidCosts;
for (auto &P : VPlans) {
for (ElementCount VF : P->vectorFactors()) {
// The cost for scalar VF=1 is already calculated, so ignore it.
if (VF.isScalar())
continue;
- InstructionCost C = CM.expectedCost(VF, &InvalidCosts);
+ InstructionCost C = CM.expectedCost(VF);
VectorizationFactor Candidate(VF, C, ScalarCost.ScalarCost);
#ifndef NDEBUG
@@ -4922,8 +4966,6 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
}
}
- emitInvalidCostRemarks(InvalidCosts, ORE, OrigLoop);
-
if (!EnableCondStoresVectorization && CM.hasPredStores()) {
reportVectorizationFailure(
"There are conditional stores.",
@@ -5828,8 +5870,7 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
return Discount;
}
-InstructionCost LoopVectorizationCostModel::expectedCost(
- ElementCount VF, SmallVectorImpl<InstructionVFPair> *Invalid) {
+InstructionCost LoopVectorizationCostModel::expectedCost(ElementCount VF) {
InstructionCost Cost;
// For each block.
@@ -5849,10 +5890,6 @@ InstructionCost LoopVectorizationCostModel::expectedCost(
if (C.isValid() && ForceTargetInstructionCost.getNumOccurrences() > 0)
C = InstructionCost(ForceTargetInstructionCost);
- // Keep a list of instructions with invalid costs.
- if (Invalid && !C.isValid())
- Invalid->emplace_back(&I, VF);
-
BlockCost += C;
LLVM_DEBUG(dbgs() << "LV: Found an estimated cost of " << C << " for VF "
<< VF << " For instruction: " << I << '\n');
@@ -7290,6 +7327,7 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
// cost-model and will be retired once the VPlan-based cost-model is
// stabilized.
VectorizationFactor VF = selectVectorizationFactor();
+
assert((VF.Width.isScalar() || VF.ScalarCost > 0) && "when vectorizing, the scalar cost must be non-zero.");
if (!hasPlanWithVF(VF.Width)) {
LLVM_DEBUG(dbgs() << "LV: No VPlan could be built for " << VF.Width
@@ -7501,6 +7539,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const {
}
}
}
+
BestPlan->setVF(BestFactor.Width);
return *BestPlan;
}
@@ -10188,6 +10227,9 @@ bool LoopVectorizePass::processLoop(Loop *L) {
// Plan how to best vectorize, return the best VF and its cost.
std::optional<VectorizationFactor> MaybeVF = LVP.plan(UserVF, UserIC);
+ if (ORE->allowExtraAnalysis(LV_NAME))
+ LVP.emitInvalidCostRemarks(ORE);
+
VectorizationFactor VF = VectorizationFactor::Disabled();
unsigned IC = 1;
@@ -10195,6 +10237,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
GeneratedRTChecks Checks(*PSE.getSE(), DT, LI, TTI,
F->getDataLayout(), AddBranchWeights);
+
if (MaybeVF) {
VF = *MaybeVF;
// Select the interleave count.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll
index b66bb948a47a5..0ef03c58be97a 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll
@@ -2,8 +2,8 @@
; RUN: FileCheck %s --check-prefix=CHECK-REMARKS < %t
; CHECK-REMARKS: UserVF ignored because of invalid costs.
-; CHECK-REMARKS: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): alloca
-; CHECK-REMARKS: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
+; CHECK-REMARKS: Recipe with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): alloca
+; CHECK-REMARKS: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): store
define void @alloca(ptr %vla, i64 %N) {
; CHECK-LABEL: @alloca(
; CHECK-NOT: <vscale x
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll
index 333bb20fd0d9a..bc6eeb470b154 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll
@@ -101,9 +101,9 @@ for.end:
}
; CHECK-REMARKS: UserVF ignored because of invalid costs.
-; CHECK-REMARKS-NEXT: t.c:3:10: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
-; CHECK-REMARKS-NEXT: t.c:3:20: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
-; CHECK-REMARKS-NEXT: t.c:3:30: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
+; CHECK-REMARKS-NEXT: t.c:3:10: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): load
+; CHECK-REMARKS-NEXT: t.c:3:20: Recipe with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
+; CHECK-REMARKS-NEXT: t.c:3:30: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): store
define void @vec_sin_no_mapping(ptr noalias nocapture %dst, ptr noalias nocapture readonly %src, i64 %n) {
; CHECK: @vec_sin_no_mapping
; CHECK: call fast <2 x float> @llvm.sin.v2f32
@@ -127,10 +127,10 @@ for.cond.cleanup: ; preds = %for.body
}
; CHECK-REMARKS: UserVF ignored because of invalid costs.
-; CHECK-REMARKS-NEXT: t.c:3:10: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
-; CHECK-REMARKS-NEXT: t.c:3:30: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
-; CHECK-REMARKS-NEXT: t.c:3:20: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
-; CHECK-REMARKS-NEXT: t.c:3:40: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
+; CHECK-REMARKS-NEXT: t.c:3:10: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): load
+; CHECK-REMARKS-NEXT: t.c:3:30: Recipe with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
+; CHECK-REMARKS-NEXT: t.c:3:20: Recipe with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
+; CHECK-REMARKS-NEXT: t.c:3:40: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): store
define void @vec_sin_no_mapping_ite(ptr noalias nocapture %dst, ptr noalias nocapture readonly %src, i64 %n) {
; CHECK: @vec_sin_no_mapping_ite
; CHECK-NOT: <vscale x
@@ -163,9 +163,9 @@ for.cond.cleanup: ; preds = %for.body
}
; CHECK-REMARKS: UserVF ignored because of invalid costs.
-; CHECK-REMARKS-NEXT: t.c:3:10: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
-; CHECK-REMARKS-NEXT: t.c:3:20: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
-; CHECK-REMARKS-NEXT: t.c:3:30: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
+; CHECK-REMARKS-NEXT: t.c:3:10: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): load
+; CHECK-REMARKS-NEXT: t.c:3:20: Recipe with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
+; CHECK-REMARKS-NEXT: t.c:3:30: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): store
define void @vec_sin_fixed_mapping(ptr noalias nocapture %dst, ptr noalias nocapture readonly %src, i64 %n) {
; CHECK: @vec_sin_fixed_mapping
; CHECK: call fast <2 x float> @llvm.sin.v2f32
|
if (InvalidCosts.empty()) | ||
return; | ||
|
||
// Emit a report of VFs with invalid costs in the loop. | ||
|
||
// Group the remarks per instruction, keeping the instruction order from | ||
// InvalidCosts. | ||
std::map<Instruction *, unsigned> Numbering; | ||
std::map<VPRecipeBase *, unsigned> Numbering; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DenseMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated , thanks!
@@ -7290,6 +7327,7 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) { | |||
// cost-model and will be retired once the VPlan-based cost-model is | |||
// stabilized. | |||
VectorizationFactor VF = selectVectorizationFactor(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove new blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
@@ -7501,6 +7539,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const { | |||
} | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove new blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
VectorizationFactor VF = VectorizationFactor::Disabled(); | ||
unsigned IC = 1; | ||
|
||
bool AddBranchWeights = | ||
hasBranchWeightMD(*L->getLoopLatch()->getTerminator()); | ||
GeneratedRTChecks Checks(*PSE.getSE(), DT, LI, TTI, | ||
F->getDataLayout(), AddBranchWeights); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove new blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
if (InvalidCosts.empty()) | ||
return; | ||
|
||
// Emit a report of VFs with invalid costs in the loop. | ||
|
||
// Group the remarks per instruction, keeping the instruction order from | ||
// InvalidCosts. | ||
std::map<Instruction *, unsigned> Numbering; | ||
std::map<VPRecipeBase *, unsigned> Numbering; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated , thanks!
@@ -7290,6 +7327,7 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) { | |||
// cost-model and will be retired once the VPlan-based cost-model is | |||
// stabilized. | |||
VectorizationFactor VF = selectVectorizationFactor(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
@@ -7501,6 +7539,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const { | |||
} | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, adding various comments.
@@ -891,10 +892,12 @@ static void debugVectorizationMessage(const StringRef Prefix, | |||
/// instruction that prevents vectorization. Otherwise \p TheLoop is used for | |||
/// the location of the remark. \return the remark object that can be | |||
/// streamed to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add documentation for \p DL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
@@ -945,13 +948,14 @@ void reportVectorizationFailure(const StringRef DebugMsg, | |||
/// as an optimization remark. Uses either \p I as location of the remark, or | |||
/// otherwise \p TheLoop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about \p DL?
/// the factor width. If \p Invalid is not nullptr, this function | ||
/// will add a pair(Instruction*, ElementCount) to \p Invalid for | ||
/// each instruction that has an Invalid cost for the given VF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop last sentence about \p Invalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed thanks!
if (VPlans.empty()) | ||
return; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (VPlans.empty()) | |
return; |
Can be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped, thanks!
LLVMContext &LLVMCtx = OrigLoop->getHeader()->getContext(); | ||
VPCostContext CostCtx(CM.TTI, Legal->getWidestInductionType(), LLVMCtx, | ||
CM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be defined at the outset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not at the moment, as we mark underlying instructions for VPReplicateRecipes to only compute the cost once, which means we would skip them in some cases
StringRef RemarkName, Loop *TheLoop, Instruction *I) { | ||
static OptimizationRemarkAnalysis | ||
createLVAnalysis(const char *PassName, StringRef RemarkName, Loop *TheLoop, | ||
Instruction *I, DebugLoc DL = {}) { | ||
Value *CodeRegion = TheLoop->getHeader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps slightly clarify the precedence. If DL is provided, should it have highest precedence?
Value *CodeRegion = TheLoop->getHeader(); | |
Value *CodeRegion = I ? I->getParent() : TheLoop->getHeader(); | |
// If debug location is attached to the instruction, use it. Otherwise if DL was not provided, use the loop's. | |
if (I && I->getdebugLoc()) | |
DL = I->getDebugLoc(); | |
else if (!DL) | |
DL = TheLoop->getStartLoc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
Instruction *I = Subset.front().first; | ||
VPRecipeBase *R = Subset.front().first; | ||
|
||
unsigned Opcode = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This retains current dumps, a worthy (temporary) objective, but deserves further attention - recipes (including those with invalid cost) should arguably print themselves, as in R.print(), perhaps supporting a shorter printing of their "opcode" only?
VPRecipeBase *R = Subset.front().first; | ||
|
||
unsigned Opcode = | ||
TypeSwitch<const VPRecipeBase *, unsigned>(R) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all cases handled, default deemed unreachable?
// If the next instruction is different, or if there are no other pairs, | ||
// emit a remark for the collated subset. e.g. | ||
// [(load, vf1), (load, vf2))] | ||
// to emit: | ||
// remark: invalid costs for 'load' at VF=(vf, vf2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If the next instruction is different, or if there are no other pairs, | |
// emit a remark for the collated subset. e.g. | |
// [(load, vf1), (load, vf2))] | |
// to emit: | |
// remark: invalid costs for 'load' at VF=(vf, vf2) | |
// If the next recipe is different, or if there are no other pairs, | |
// emit a remark for the collated subset. e.g. | |
// [(load, VF1), (load, VF2))] | |
// to emit: | |
// remark: invalid costs for 'load' at VF=(VF1, VF2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
if (Opcode == Instruction::Call) | ||
OS << " call to " | ||
<< R->getOperand(R->getNumOperands() - 1) | ||
->getLiveInIRValue() | ||
->getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better have VPWidenCallRecipe and VPReplicateRecipe-of-a-Call take care of printing themselves via getCalledScalarFunction and underlying.getCalledFunction(), respectively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some post-commit notes.
Value *CodeRegion = I ? I->getParent() : TheLoop->getHeader(); | ||
// If debug location is attached to the instruction, use it. Otherwise if DL | ||
// was not provided, use the loop's. | ||
if (I && I->getDebugLoc()) | ||
DL = I->getDebugLoc(); | ||
else if (!DL) | ||
DL = TheLoop->getStartLoc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(post commit): slight discrepancy with documentation above, should this instead read
// If debug location is provided, use it. Otherwise if debug location is attached to the instruction, use it.
// Otherwise use the start location of the loop.
if (!DL)
DL = (I && I->getDebugLoc()) ? I->getDebugLoc() : TheLoop->getStartLoc();
VPCostContext CostCtx(CM.TTI, Legal->getWidestInductionType(), LLVMCtx, | ||
CM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CostCtx is kept here rather than hoisting it alongside LLVMCtx above, due to its caching of SkipCostComputation. But the latter is initialized by calling LVP::cost(Plan, VF), whereas here all recipes are asked for their cost directly, w/o going through LVP::cost(). Should LVP::cost() be called first, and iff it returns invalid traverse the recipes? Or note that invalid costs cannot be skipped(?), so calling LVP::cost() is redundant when only invalid costs are sought, in which case CostCtx can be hoisted(?)
@@ -943,15 +942,17 @@ void reportVectorizationFailure(const StringRef DebugMsg, | |||
|
|||
/// Reports an informative message: print \p Msg for debugging purposes as well | |||
/// as an optimization remark. Uses either \p I as location of the remark, or | |||
/// otherwise \p TheLoop. | |||
/// otherwise \p TheLoop. If \p DL is passed, use it as debug location for the | |||
/// remark. If \p DL is passed, use it as debug location for the remark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last sentence repeated twice.
: cast<Function>(R->getOperand(R->getNumOperands() - 1) | ||
->getLiveInIRValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth at-least a comment or assert, noting that if not WidenCall then R is replicating a CallInst, both having the called function as their last operand. Better have ReplicateRecipe (or a derivative thereof) provide the called function explicitly (or print its name), to avoid bypassing CallInst's getCalledFunction(), also noting that recipes may use their last operand for an optional mask.
This patch moves the logic to create remarks for instructions with invalid costs to work on recipes and decoupling it from selectVectorizationFactor. This is needed to replace the remaining uses of selectVectorizationFactor with getBestPlan using the VPlan-based cost model.
The current implementation iterates over all VPlans and their recipes again, to find recipes with invalid costs, which is more work but will only be done when remarks for LV are enabled. Once the remaining uses of selectVectorizationFactor are retired, we can collect VPlans with invalid costs as part of getBestPlan if we want to optimize the remarks case a bit, at the cost of adding additional complexity.