[VPlan] Port invalid cost remarks to VPlan. #99322

fhahn · 2024-07-17T13:17:28Z

This patch moves the logic to create remarks for instructions with invalid costs to work on recipes and decoupling it from selectVectorizationFactor. This is needed to replace the remaining uses of selectVectorizationFactor with getBestPlan using the VPlan-based cost model.

The current implementation iterates over all VPlans and their recipes again, to find recipes with invalid costs, which is more work but will only be done when remarks for LV are enabled. Once the remaining uses of selectVectorizationFactor are retired, we can collect VPlans with invalid costs as part of getBestPlan if we want to optimize the remarks case a bit, at the cost of adding additional complexity.

This patch moves the logic to create remarks for instructions with invalid costs to work on recipes and decoupling it from selectVectorizationFactor. This is needed to replace the remaining uses of selectVectorizationFactor with getBestPlan using the VPlan-based cost model. The current implementation iterates over all VPlans and their recipes again, to find recipes with invalid costs, which is more work but will only be done when remarks for LV are enabled. Once the remaining uses of selectVectorizationFactor are retired, we can collect VPlans with invalid costs as part of getBestPlan if we want to optimize the remarks case a bit, at the cost of adding additional complexity.

llvmbot · 2024-07-17T13:18:32Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

This patch moves the logic to create remarks for instructions with invalid costs to work on recipes and decoupling it from selectVectorizationFactor. This is needed to replace the remaining uses of selectVectorizationFactor with getBestPlan using the VPlan-based cost model.

The current implementation iterates over all VPlans and their recipes again, to find recipes with invalid costs, which is more work but will only be done when remarks for LV are enabled. Once the remaining uses of selectVectorizationFactor are retired, we can collect VPlans with invalid costs as part of getBestPlan if we want to optimize the remarks case a bit, at the cost of adding additional complexity.

Full diff: https://github.com/llvm/llvm-project/pull/99322.diff

4 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h (+3)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+78-35)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll (+10-10)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index c63cf0c37f2f9..ac34fa4ba2777 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -411,6 +411,9 @@ class LoopVectorizationPlanner {
   VectorizationFactor
   selectEpilogueVectorizationFactor(const ElementCount MaxVF, unsigned IC);
 
+  /// Emit remarks for recipes with invalid costs in the available VPlans.
+  void emitInvalidCostRemarks(OptimizationRemarkEmitter *ORE);
+
 protected:
   /// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,
   /// according to the information gathered by Legal when it checked if it is
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5fc365f77efbb..a55e06594a394 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -75,6 +75,7 @@
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
+#include "llvm/ADT/TypeSwitch.h"
 #include "llvm/ADT/iterator_range.h"
 #include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/BasicAliasAnalysis.h"
@@ -936,10 +937,12 @@ static void debugVectorizationMessage(const StringRef Prefix,
 /// instruction that prevents vectorization.  Otherwise \p TheLoop is used for
 /// the location of the remark.  \return the remark object that can be
 /// streamed to.
-static OptimizationRemarkAnalysis createLVAnalysis(const char *PassName,
-    StringRef RemarkName, Loop *TheLoop, Instruction *I) {
+static OptimizationRemarkAnalysis
+createLVAnalysis(const char *PassName, StringRef RemarkName, Loop *TheLoop,
+                 Instruction *I, DebugLoc DL = {}) {
   Value *CodeRegion = TheLoop->getHeader();
-  DebugLoc DL = TheLoop->getStartLoc();
+  if (!DL)
+    DL = TheLoop->getStartLoc();
 
   if (I) {
     CodeRegion = I->getParent();
@@ -990,13 +993,14 @@ void reportVectorizationFailure(const StringRef DebugMsg,
 /// as an optimization remark. Uses either \p I as location of the remark, or
 /// otherwise \p TheLoop.
 static void reportVectorizationInfo(const StringRef Msg, const StringRef ORETag,
-                             OptimizationRemarkEmitter *ORE, Loop *TheLoop,
-                             Instruction *I = nullptr) {
+                                    OptimizationRemarkEmitter *ORE,
+                                    Loop *TheLoop, Instruction *I = nullptr,
+                                    DebugLoc DL = {}) {
   LLVM_DEBUG(debugVectorizationMessage("", Msg, I));
   LoopVectorizeHints Hints(TheLoop, true /* doesn't matter */, *ORE);
-  ORE->emit(
-      createLVAnalysis(Hints.vectorizeAnalysisPassName(), ORETag, TheLoop, I)
-      << Msg);
+  ORE->emit(createLVAnalysis(Hints.vectorizeAnalysisPassName(), ORETag, TheLoop,
+                             I, DL)
+            << Msg);
 }
 
 /// Report successful vectorization of the loop. In case an outer loop is
@@ -1586,9 +1590,7 @@ class LoopVectorizationCostModel {
   /// the factor width. If \p Invalid is not nullptr, this function
   /// will add a pair(Instruction*, ElementCount) to \p Invalid for
   /// each instruction that has an Invalid cost for the given VF.
-  InstructionCost
-  expectedCost(ElementCount VF,
-               SmallVectorImpl<InstructionVFPair> *Invalid = nullptr);
+  InstructionCost expectedCost(ElementCount VF);
 
   bool hasPredStores() const { return NumPredStores > 0; }
 
@@ -4694,9 +4696,28 @@ bool LoopVectorizationPlanner::isMoreProfitable(
   return CmpFn(RTCostA, RTCostB);
 }
 
-static void emitInvalidCostRemarks(SmallVector<InstructionVFPair> InvalidCosts,
-                                   OptimizationRemarkEmitter *ORE,
-                                   Loop *TheLoop) {
+void LoopVectorizationPlanner::emitInvalidCostRemarks(
+    OptimizationRemarkEmitter *ORE) {
+  if (VPlans.empty())
+    return;
+
+  using RecipeVFPair = std::pair<VPRecipeBase *, ElementCount>;
+  SmallVector<RecipeVFPair> InvalidCosts;
+  for (const auto &Plan : VPlans) {
+    for (ElementCount VF : Plan->vectorFactors()) {
+      LLVMContext &LLVMCtx = OrigLoop->getHeader()->getContext();
+      VPCostContext CostCtx(CM.TTI, Legal->getWidestInductionType(), LLVMCtx,
+                            CM);
+      auto Iter = vp_depth_first_deep(Plan->getVectorLoopRegion()->getEntry());
+      for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
+        for (auto &R : *VPBB) {
+          if (R.cost(VF, CostCtx).isValid())
+            continue;
+          InvalidCosts.emplace_back(&R, VF);
+        }
+      }
+    }
+  }
   if (InvalidCosts.empty())
     return;
 
@@ -4704,14 +4725,14 @@ static void emitInvalidCostRemarks(SmallVector<InstructionVFPair> InvalidCosts,
 
   // Group the remarks per instruction, keeping the instruction order from
   // InvalidCosts.
-  std::map<Instruction *, unsigned> Numbering;
+  std::map<VPRecipeBase *, unsigned> Numbering;
   unsigned I = 0;
   for (auto &Pair : InvalidCosts)
     if (!Numbering.count(Pair.first))
       Numbering[Pair.first] = I++;
 
   // Sort the list, first on instruction(number) then on VF.
-  sort(InvalidCosts, [&Numbering](InstructionVFPair &A, InstructionVFPair &B) {
+  sort(InvalidCosts, [&Numbering](RecipeVFPair &A, RecipeVFPair &B) {
     if (Numbering[A.first] != Numbering[B.first])
       return Numbering[A.first] < Numbering[B.first];
     const auto &LHS = A.second;
@@ -4725,33 +4746,57 @@ static void emitInvalidCostRemarks(SmallVector<InstructionVFPair> InvalidCosts,
   // Group the instructions together to emit separate remarks for:
   //   load  (vf1, vf2)
   //   store (vf1)
-  auto Tail = ArrayRef<InstructionVFPair>(InvalidCosts);
-  auto Subset = ArrayRef<InstructionVFPair>();
+  auto Tail = ArrayRef<RecipeVFPair>(InvalidCosts);
+  auto Subset = ArrayRef<RecipeVFPair>();
   do {
     if (Subset.empty())
       Subset = Tail.take_front(1);
 
-    Instruction *I = Subset.front().first;
+    VPRecipeBase *R = Subset.front().first;
+
+    unsigned Opcode =
+        TypeSwitch<const VPRecipeBase *, unsigned>(R)
+            .Case<VPHeaderPHIRecipe>(
+                [](const auto *R) { return Instruction::PHI; })
+            .Case<VPWidenSelectRecipe>(
+                [](const auto *R) { return Instruction::Select; })
+            .Case<VPWidenStoreRecipe>(
+                [](const auto *R) { return Instruction::Store; })
+            .Case<VPWidenLoadRecipe>(
+                [](const auto *R) { return Instruction::Load; })
+            .Case<VPWidenCallRecipe>(
+                [](const auto *R) { return Instruction::Call; })
+            .Case<VPInstruction, VPWidenRecipe, VPReplicateRecipe,
+                  VPWidenCastRecipe>(
+                [](const auto *R) { return R->getOpcode(); })
+            .Case<VPInterleaveRecipe>([](const VPInterleaveRecipe *R) {
+              return R->getStoredValues().empty() ? Instruction::Load
+                                                  : Instruction::Store;
+            });
 
     // If the next instruction is different, or if there are no other pairs,
     // emit a remark for the collated subset. e.g.
     //   [(load, vf1), (load, vf2))]
     // to emit:
     //  remark: invalid costs for 'load' at VF=(vf, vf2)
-    if (Subset == Tail || Tail[Subset.size()].first != I) {
+    if (Subset == Tail || Tail[Subset.size()].first != R) {
       std::string OutString;
       raw_string_ostream OS(OutString);
       assert(!Subset.empty() && "Unexpected empty range");
-      OS << "Instruction with invalid costs prevented vectorization at VF=(";
+      OS << "Recipe with invalid costs prevented vectorization at VF=(";
       for (const auto &Pair : Subset)
         OS << (Pair.second == Subset.front().second ? "" : ", ") << Pair.second;
       OS << "):";
-      if (auto *CI = dyn_cast<CallInst>(I))
-        OS << " call to " << CI->getCalledFunction()->getName();
+      if (Opcode == Instruction::Call)
+        OS << " call to "
+           << R->getOperand(R->getNumOperands() - 1)
+                  ->getLiveInIRValue()
+                  ->getName();
       else
-        OS << " " << I->getOpcodeName();
+        OS << " " << Instruction::getOpcodeName(Opcode);
       OS.flush();
-      reportVectorizationInfo(OutString, "InvalidCost", ORE, TheLoop, I);
+      reportVectorizationInfo(OutString, "InvalidCost", ORE, OrigLoop, nullptr,
+                              R->getDebugLoc());
       Tail = Tail.drop_front(Subset.size());
       Subset = {};
     } else
@@ -4880,14 +4925,13 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
     ChosenFactor.Cost = InstructionCost::getMax();
   }
 
-  SmallVector<InstructionVFPair> InvalidCosts;
   for (auto &P : VPlans) {
     for (ElementCount VF : P->vectorFactors()) {
       // The cost for scalar VF=1 is already calculated, so ignore it.
       if (VF.isScalar())
         continue;
 
-      InstructionCost C = CM.expectedCost(VF, &InvalidCosts);
+      InstructionCost C = CM.expectedCost(VF);
       VectorizationFactor Candidate(VF, C, ScalarCost.ScalarCost);
 
 #ifndef NDEBUG
@@ -4922,8 +4966,6 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
     }
   }
 
-  emitInvalidCostRemarks(InvalidCosts, ORE, OrigLoop);
-
   if (!EnableCondStoresVectorization && CM.hasPredStores()) {
     reportVectorizationFailure(
         "There are conditional stores.",
@@ -5828,8 +5870,7 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
   return Discount;
 }
 
-InstructionCost LoopVectorizationCostModel::expectedCost(
-    ElementCount VF, SmallVectorImpl<InstructionVFPair> *Invalid) {
+InstructionCost LoopVectorizationCostModel::expectedCost(ElementCount VF) {
   InstructionCost Cost;
 
   // For each block.
@@ -5849,10 +5890,6 @@ InstructionCost LoopVectorizationCostModel::expectedCost(
       if (C.isValid() && ForceTargetInstructionCost.getNumOccurrences() > 0)
         C = InstructionCost(ForceTargetInstructionCost);
 
-      // Keep a list of instructions with invalid costs.
-      if (Invalid && !C.isValid())
-        Invalid->emplace_back(&I, VF);
-
       BlockCost += C;
       LLVM_DEBUG(dbgs() << "LV: Found an estimated cost of " << C << " for VF "
                         << VF << " For instruction: " << I << '\n');
@@ -7290,6 +7327,7 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
   // cost-model and will be retired once the VPlan-based cost-model is
   // stabilized.
   VectorizationFactor VF = selectVectorizationFactor();
+
   assert((VF.Width.isScalar() || VF.ScalarCost > 0) && "when vectorizing, the scalar cost must be non-zero.");
   if (!hasPlanWithVF(VF.Width)) {
     LLVM_DEBUG(dbgs() << "LV: No VPlan could be built for " << VF.Width
@@ -7501,6 +7539,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const {
       }
     }
   }
+
   BestPlan->setVF(BestFactor.Width);
   return *BestPlan;
 }
@@ -10188,6 +10227,9 @@ bool LoopVectorizePass::processLoop(Loop *L) {
   // Plan how to best vectorize, return the best VF and its cost.
   std::optional<VectorizationFactor> MaybeVF = LVP.plan(UserVF, UserIC);
 
+  if (ORE->allowExtraAnalysis(LV_NAME))
+    LVP.emitInvalidCostRemarks(ORE);
+
   VectorizationFactor VF = VectorizationFactor::Disabled();
   unsigned IC = 1;
 
@@ -10195,6 +10237,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
       hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
   GeneratedRTChecks Checks(*PSE.getSE(), DT, LI, TTI,
                            F->getDataLayout(), AddBranchWeights);
+
   if (MaybeVF) {
     VF = *MaybeVF;
     // Select the interleave count.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll
index b66bb948a47a5..0ef03c58be97a 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll
@@ -2,8 +2,8 @@
 ; RUN: FileCheck %s --check-prefix=CHECK-REMARKS < %t
 
 ; CHECK-REMARKS: UserVF ignored because of invalid costs.
-; CHECK-REMARKS: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): alloca
-; CHECK-REMARKS: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
+; CHECK-REMARKS: Recipe with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): alloca
+; CHECK-REMARKS: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): store
 define void @alloca(ptr %vla, i64 %N) {
 ; CHECK-LABEL: @alloca(
 ; CHECK-NOT: <vscale x
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll
index 333bb20fd0d9a..bc6eeb470b154 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll
@@ -101,9 +101,9 @@ for.end:
 }
 
 ; CHECK-REMARKS: UserVF ignored because of invalid costs.
-; CHECK-REMARKS-NEXT: t.c:3:10: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
-; CHECK-REMARKS-NEXT: t.c:3:20: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
-; CHECK-REMARKS-NEXT: t.c:3:30: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
+; CHECK-REMARKS-NEXT: t.c:3:10: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): load
+; CHECK-REMARKS-NEXT: t.c:3:20: Recipe with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
+; CHECK-REMARKS-NEXT: t.c:3:30: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): store
 define void @vec_sin_no_mapping(ptr noalias nocapture %dst, ptr noalias nocapture readonly %src, i64 %n) {
 ; CHECK: @vec_sin_no_mapping
 ; CHECK: call fast <2 x float> @llvm.sin.v2f32
@@ -127,10 +127,10 @@ for.cond.cleanup:                                 ; preds = %for.body
 }
 
 ; CHECK-REMARKS: UserVF ignored because of invalid costs.
-; CHECK-REMARKS-NEXT: t.c:3:10: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
-; CHECK-REMARKS-NEXT: t.c:3:30: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
-; CHECK-REMARKS-NEXT: t.c:3:20: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
-; CHECK-REMARKS-NEXT: t.c:3:40: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
+; CHECK-REMARKS-NEXT: t.c:3:10: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): load
+; CHECK-REMARKS-NEXT: t.c:3:30: Recipe with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
+; CHECK-REMARKS-NEXT: t.c:3:20: Recipe with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
+; CHECK-REMARKS-NEXT: t.c:3:40: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): store
 define void @vec_sin_no_mapping_ite(ptr noalias nocapture %dst, ptr noalias nocapture readonly %src, i64 %n) {
 ; CHECK: @vec_sin_no_mapping_ite
 ; CHECK-NOT: <vscale x
@@ -163,9 +163,9 @@ for.cond.cleanup:                                 ; preds = %for.body
 }
 
 ; CHECK-REMARKS: UserVF ignored because of invalid costs.
-; CHECK-REMARKS-NEXT: t.c:3:10: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
-; CHECK-REMARKS-NEXT: t.c:3:20: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
-; CHECK-REMARKS-NEXT: t.c:3:30: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
+; CHECK-REMARKS-NEXT: t.c:3:10: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): load
+; CHECK-REMARKS-NEXT: t.c:3:20: Recipe with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): call to llvm.sin.f32
+; CHECK-REMARKS-NEXT: t.c:3:30: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): store
 define void @vec_sin_fixed_mapping(ptr noalias nocapture %dst, ptr noalias nocapture readonly %src, i64 %n) {
 ; CHECK: @vec_sin_fixed_mapping
 ; CHECK: call fast <2 x float> @llvm.sin.v2f32

alexey-bataev · 2024-07-17T13:29:04Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

  if (InvalidCosts.empty())
    return;

  // Emit a report of VFs with invalid costs in the loop.

  // Group the remarks per instruction, keeping the instruction order from
  // InvalidCosts.
-  std::map<Instruction *, unsigned> Numbering;
+  std::map<VPRecipeBase *, unsigned> Numbering;


Updated , thanks!

alexey-bataev · 2024-07-17T13:30:52Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7290,6 +7327,7 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
  // cost-model and will be retired once the VPlan-based cost-model is
  // stabilized.
  VectorizationFactor VF = selectVectorizationFactor();
+


Remove new blank line

Removed, thanks!

alexey-bataev · 2024-07-17T13:30:58Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7501,6 +7539,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const {
      }
    }
  }
+


Remove new blank line

Removed, thanks!

alexey-bataev · 2024-07-17T13:31:04Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

  VectorizationFactor VF = VectorizationFactor::Disabled();
  unsigned IC = 1;

  bool AddBranchWeights =
      hasBranchWeightMD(*L->getLoopLatch()->getTerminator());
  GeneratedRTChecks Checks(*PSE.getSE(), DT, LI, TTI,
                           F->getDataLayout(), AddBranchWeights);
+


Remove new blank line

…marks

fhahn

ping

fhahn · 2024-07-24T15:31:25Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

  if (InvalidCosts.empty())
    return;

  // Emit a report of VFs with invalid costs in the loop.

  // Group the remarks per instruction, keeping the instruction order from
  // InvalidCosts.
-  std::map<Instruction *, unsigned> Numbering;
+  std::map<VPRecipeBase *, unsigned> Numbering;


Updated , thanks!

fhahn · 2024-07-24T15:31:48Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7290,6 +7327,7 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
  // cost-model and will be retired once the VPlan-based cost-model is
  // stabilized.
  VectorizationFactor VF = selectVectorizationFactor();
+


Removed, thanks!

fhahn · 2024-07-24T15:32:08Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7501,6 +7539,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const {
      }
    }
  }
+


Removed, thanks!

ayalz

This LGTM, adding various comments.

ayalz · 2024-07-24T19:51:17Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -891,10 +892,12 @@ static void debugVectorizationMessage(const StringRef Prefix,
 /// instruction that prevents vectorization.  Otherwise \p TheLoop is used for
 /// the location of the remark.  \return the remark object that can be
 /// streamed to.


Add documentation for \p DL?

Added, thanks!

ayalz · 2024-07-24T19:52:56Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -945,13 +948,14 @@ void reportVectorizationFailure(const StringRef DebugMsg,
 /// as an optimization remark. Uses either \p I as location of the remark, or
 /// otherwise \p TheLoop.


What about \p DL?

ayalz · 2024-07-24T19:54:56Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

  /// the factor width. If \p Invalid is not nullptr, this function
  /// will add a pair(Instruction*, ElementCount) to \p Invalid for
  /// each instruction that has an Invalid cost for the given VF.


Drop last sentence about \p Invalid?

Removed thanks!

ayalz · 2024-07-24T19:57:02Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  if (VPlans.empty())
+    return;
+


Suggested change

if (VPlans.empty())

return;

Can be dropped.

Dropped, thanks!

ayalz · 2024-07-24T20:04:08Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      LLVMContext &LLVMCtx = OrigLoop->getHeader()->getContext();
+      VPCostContext CostCtx(CM.TTI, Legal->getWidestInductionType(), LLVMCtx,
+                            CM);


Can be defined at the outset?

Unfortunately not at the moment, as we mark underlying instructions for VPReplicateRecipes to only compute the cost once, which means we would skip them in some cases

ayalz · 2024-07-25T12:26:41Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-    StringRef RemarkName, Loop *TheLoop, Instruction *I) {
+static OptimizationRemarkAnalysis
+createLVAnalysis(const char *PassName, StringRef RemarkName, Loop *TheLoop,
+                 Instruction *I, DebugLoc DL = {}) {
  Value *CodeRegion = TheLoop->getHeader();


Perhaps slightly clarify the precedence. If DL is provided, should it have highest precedence?

Suggested change

Value *CodeRegion = TheLoop->getHeader();

Value *CodeRegion = I ? I->getParent() : TheLoop->getHeader();

// If debug location is attached to the instruction, use it. Otherwise if DL was not provided, use the loop's.

if (I && I->getdebugLoc())

DL = I->getDebugLoc();

else if (!DL)

DL = TheLoop->getStartLoc();

Adjusted, thanks!

ayalz · 2024-07-25T14:06:41Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-    Instruction *I = Subset.front().first;
+    VPRecipeBase *R = Subset.front().first;
+
+    unsigned Opcode =


This retains current dumps, a worthy (temporary) objective, but deserves further attention - recipes (including those with invalid cost) should arguably print themselves, as in R.print(), perhaps supporting a shorter printing of their "opcode" only?

ayalz · 2024-07-25T14:08:10Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    VPRecipeBase *R = Subset.front().first;
+
+    unsigned Opcode =
+        TypeSwitch<const VPRecipeBase *, unsigned>(R)


Are all cases handled, default deemed unreachable?

ayalz · 2024-07-25T14:09:18Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

    // If the next instruction is different, or if there are no other pairs,
    // emit a remark for the collated subset. e.g.
    //   [(load, vf1), (load, vf2))]
    // to emit:
    //  remark: invalid costs for 'load' at VF=(vf, vf2)


Suggested change

// If the next instruction is different, or if there are no other pairs,

// emit a remark for the collated subset. e.g.

// [(load, vf1), (load, vf2))]

// to emit:

// remark: invalid costs for 'load' at VF=(vf, vf2)

// If the next recipe is different, or if there are no other pairs,

// emit a remark for the collated subset. e.g.

// [(load, VF1), (load, VF2))]

// to emit:

// remark: invalid costs for 'load' at VF=(VF1, VF2)

Done, thanks!

ayalz · 2024-07-25T14:20:39Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      if (Opcode == Instruction::Call)
+        OS << " call to "
+           << R->getOperand(R->getNumOperands() - 1)
+                  ->getLiveInIRValue()
+                  ->getName();


Better have VPWidenCallRecipe and VPReplicateRecipe-of-a-Call take care of printing themselves via getCalledScalarFunction and underlying.getCalledFunction(), respectively?

Updated, thanks!

…marks

ayalz

Some post-commit notes.

ayalz · 2024-07-28T11:16:14Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  Value *CodeRegion = I ? I->getParent() : TheLoop->getHeader();
+  // If debug location is attached to the instruction, use it. Otherwise if DL
+  // was not provided, use the loop's.
+  if (I && I->getDebugLoc())
+    DL = I->getDebugLoc();
+  else if (!DL)
+    DL = TheLoop->getStartLoc();


(post commit): slight discrepancy with documentation above, should this instead read

// If debug location is provided, use it. Otherwise if debug location is attached to the instruction, use it. // Otherwise use the start location of the loop. if (!DL) DL = (I && I->getDebugLoc()) ? I->getDebugLoc() : TheLoop->getStartLoc();

ayalz · 2024-07-28T11:30:48Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      VPCostContext CostCtx(CM.TTI, Legal->getWidestInductionType(), LLVMCtx,
+                            CM);


CostCtx is kept here rather than hoisting it alongside LLVMCtx above, due to its caching of SkipCostComputation. But the latter is initialized by calling LVP::cost(Plan, VF), whereas here all recipes are asked for their cost directly, w/o going through LVP::cost(). Should LVP::cost() be called first, and iff it returns invalid traverse the recipes? Or note that invalid costs cannot be skipped(?), so calling LVP::cost() is redundant when only invalid costs are sought, in which case CostCtx can be hoisted(?)

ayalz · 2024-07-28T11:48:36Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -943,15 +942,17 @@ void reportVectorizationFailure(const StringRef DebugMsg,

 /// Reports an informative message: print \p Msg for debugging purposes as well
 /// as an optimization remark. Uses either \p I as location of the remark, or
-/// otherwise \p TheLoop.
+/// otherwise \p TheLoop. If \p DL is passed, use it as debug location for the
+/// remark. If \p DL is passed, use it as debug location for the remark.


Last sentence repeated twice.

ayalz · 2024-07-28T11:54:27Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+                      : cast<Function>(R->getOperand(R->getNumOperands() - 1)
+                                           ->getLiveInIRValue());


Worth at-least a comment or assert, noting that if not WidenCall then R is replicating a CallInst, both having the called function as their last operand. Better have ReplicateRecipe (or a derivative thereof) provide the called function explicitly (or print its name), to avoid bypassing CallInst's getCalledFunction(), also noting that recipes may use their last operand for an optional mask.

fhahn requested review from rengolin, ayalz and aniragil July 17, 2024 13:17

llvmbot added vectorizers llvm:transforms labels Jul 17, 2024

alexey-bataev reviewed Jul 17, 2024

View reviewed changes

fhahn added 2 commits July 24, 2024 16:25

Merge remote-tracking branch 'origin/main' into vplan-invalid-cost-re…

65cb0cd

…marks

!fixup address review comments, thanks!

31ed354

fhahn commented Jul 24, 2024

View reviewed changes

ayalz approved these changes Jul 25, 2024

View reviewed changes

fhahn added 2 commits July 26, 2024 12:35

Merge remote-tracking branch 'origin/main' into vplan-invalid-cost-re…

d5bcc97

…marks

!fixup address comments, thanks!

6f1b14d

fhahn merged commit 66ce4f7 into llvm:main Jul 27, 2024
7 checks passed

fhahn deleted the vplan-invalid-cost-remarks branch July 27, 2024 11:52

ayalz reviewed Jul 28, 2024

View reviewed changes

@@ @@ -7501,6 +7539,7 @@ VPlan &LoopVectorizationPlanner::getBestPlan() const { @@
                     }
                   }
                 }

		@@ -945,13 +948,14 @@ void reportVectorizationFailure(const StringRef DebugMsg,
		/// as an optimization remark. Uses either \p I as location of the remark, or
		/// otherwise \p TheLoop.

		VPCostContext CostCtx(CM.TTI, Legal->getWidestInductionType(), LLVMCtx,
		CM);

		: cast<Function>(R->getOperand(R->getNumOperands() - 1)
		->getLiveInIRValue());

[VPlan] Port invalid cost remarks to VPlan. #99322

[VPlan] Port invalid cost remarks to VPlan. #99322

Conversation

fhahn commented Jul 17, 2024

llvmbot commented Jul 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment