Post RA software pipeliner #146

martien-de-jong · 2024-08-07T13:56:29Z

First workable version of the post-RegAlloc pipeliner.
It creates a DAG that simulates an unrolled loop, and it check whether all latencies can be met by scheduling every copy in the same cycle modulo II.
Since the DAG considers the physical register deps, there's no register renaming to be done, and it uses negative latencies in the same way as the regular post-scheduler.
The results look promising for underfull loops, typically load-operate-store, where the load latency should span quite a few loop iterations to get the resources saturated.

There are still some rough edges, and more tests need to be added, especially to cover the cases where SWP fails.

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

llvm/lib/Target/AIE/AIE2InstrInfo.cpp

andcarminati · 2024-08-13T09:45:54Z

llvm/lib/Target/AIE/AIE2TargetTransformInfo.h

    UP.Threshold = 200;
    BaseT::getUnrollingPreferences(L, SE, UP, ORE);
+    UP.Partial = UP.Runtime = false;


This is nice! Now our decisions are not changed by BaseT implementation! Maybe those small changes can be merged in small PRs.

andcarminati · 2024-08-13T09:56:58Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

-  assert(MBB.pred_size() == 1 && "MBB contains more than 1 predecessor");
-  MachineBasicBlock *SinglePredMBB = *MBB.predecessors().begin();
-  return SinglePredMBB;
+MachineBasicBlock *getLoopPredecessor(const MachineBasicBlock &MBB) {


nit: good candidate for LoopUtils.

I'm hesitant, since it only applies to epilogues that have been selected by the classification of loop-aware candidates here.

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

andcarminati · 2024-08-13T15:28:36Z

llvm/lib/Target/AIE/AIEBaseHardwareLoops.cpp

+
+  // We use ADD_NC, which allows PostPipeliner to tweak it by modifying the
+  // immediate value.
+  BuildMI(*MBB, Start, Start->getDebugLoc(), TII->get(AIE2::ADD_NC), AIE2::LC)


We merged e33bbe1 as a bug fix for the trip count adjustment. As this change also overcomes the same problem, we can replace later (revert the first fix, for example).

llvm/lib/Target/AIE/AIEPostPipeliner.cpp

andcarminati · 2024-08-16T09:02:25Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

+                      << CurrentBlockState->FixPoint.NumIters
+                      << " II=" << CurrentBlockState->FixPoint.II);
+}
+namespace {


As this file is growing, maybe we can move this class to AIEPostPipeliner* files.

Perhaps not the best criterion to do this. It implements an interface to the pipeliner that is dedicated to the interblock data structures. All that knowledge would need to be exported to the postpipeliner, which would clutter it.

andcarminati · 2024-08-16T09:05:39Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

+  bool InLoop = false;
+
+  void startPrologue() override {
+    // Nothing at this time, but let's keep the override around


Do we have some future plan for this method?

Only vaguely. Currently we insert an opaque region which is not seen by the normal scheduler. In future we may need to do some boundary signaling.

andcarminati · 2024-08-26T08:17:20Z

Hi @martien-de-jong, some commits from this PR were merged:

#152
#162

llvm/include/llvm/CodeGen/ResourceScoreboard.h

gbossu · 2024-09-05T14:04:43Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

@@ -4,14 +4,19 @@
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
-// (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates
+// (c) Copyright 2023-2024 Advanced Micro Devices, Inc. or its affiliates


I don't think this is as old as 2023

Did you fix this in a future commit? The diff is still introduced in 46a9ac2

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

llvm/lib/Target/AIE/AIEPostPipeliner.cpp

gbossu · 2024-09-27T13:54:42Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

-      BS.FixPoint.LatencyMargin++;
+  BS.FixPoint.NumIters++;
+  int &II = BS.FixPoint.II;
+  if (!II) {


That's now a big function with lots of ifs. How about introducing two functions for checking the fixpoint state? E.g.

BS.FixPoint.NumIters++; if (State== Scheduling) return checkLoopAwareScheduling(); return checkPipelining()

(That would also help limit the noise in diff lines)

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

gbossu · 2024-09-27T14:07:53Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

+      if (Move) {
+        BB->remove_instr(MI);
+      }
+      BB->insert(Before, MI);


Is there an equivalent insert_instr? Or is it on purpose so that we insert inside an existing bundle?

I can never get my head around these issues. My assumption is that at this point we have no bundles, and we insert instructions before other instructions, leaving BundledWithPred markers to really create the bundles afterwards.

gbossu · 2024-09-27T14:14:14Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

  BoundaryEdges = std::make_unique<InterBlockEdges>(Context);
+  if (Regions.size() == 1) {
+    // Don't worry, this just constructs a mostly empty container class


I like this note for the anxious programmer 😄

llvm/lib/Target/AIE/AIEPostPipeliner.cpp

gbossu · 2024-09-27T15:00:42Z

llvm/test/CodeGen/AIE/aie2/hardware-loops/split-zol-jmp.mir

+    PseudoJ_jump_imm  %bb.2
+
+  bb.2 (align 16):
+    PseudoRET implicit $lr


Nit: if the exit is already dedicated, I guess we do not need a new fall-through? But I guess that is not important because we later optimize the CFG anyway?

Well, the fallthrough also makes the body single-regioned, Programming around that is sordid. Also note that hwloop lowering sits just before postmisched.

gbossu · 2024-09-27T15:04:42Z

llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/add-store.mir

+  ; CHECK-NEXT:  .LBB0_2: // %for.body
+  ; CHECK-NEXT:    // =>This Inner Loop Header: Depth=1
+  ; CHECK-NEXT:  .L_LEnd0:
+  ; CHECK-NEXT:    nopb ; nopa ; st r1, [p0], #4; add r1, r1, #1; nopm ; nopv


gbossu · 2024-09-27T15:04:47Z

llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/add-store.mir

+  ; CHECK-NEXT:  .LBB0_2: // %for.body
+  ; CHECK-NEXT:    // =>This Inner Loop Header: Depth=1
+  ; CHECK-NEXT:  .L_LEnd0:
+  ; CHECK-NEXT:    nopb ; nopa ; st r1, [p0], #4; add r1, r1, #1; nopm ; nopv


gbossu · 2024-09-27T15:07:21Z

llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/bitwisenot.mir

+  ; CHECK-NEXT:    // =>This Inner Loop Header: Depth=1
+  ; CHECK-NEXT:    vldb wh0, [p0, #32]; nopa ; vst wh1, [p1, #32]; nopx ; vbneg_ltz.s16 x1, r21, x0; nopv
+  ; CHECK-NEXT:  .L_LEnd0:
+  ; CHECK-NEXT:    vldb wl0, [p0], #64; nopa ; vst wl1, [p1], #64; nopxm ; nopv


Very nice 🎉 🎉

gbossu · 2024-09-27T15:51:02Z

llvm/lib/Target/AIE/AIEBasePipelinerLoopInfo.cpp

@@ -664,6 +669,8 @@ class ZeroOverheadLoop : public AIEBasePipelinerLoopInfo {
      SmallVectorImpl<MachineOperand> &Cond) override;

  bool canAcceptII(SMSchedule &SMS) override;
+
+  bool shouldUseSchedule(SwingSchedulerDAG &SSD, SMSchedule &SMS) override;


We should probably update canAcceptII as well. Typically, this will increase the II until we have a low enough stage count. So I think that if we find a schedule with a low II and high stage count, we should immediately refuse it sop the post-pipeliner can pick it; and not increase the II.

That's an interface change I think? it should be able to say yes, higherII and stop. That would definitely save some time.

I guess we would accept the II if we are confident we can pick up the loop in the post-pipeliner. And for those loops, shouldUseSchedule would return false.

Somehow that makes sense, except for the name of canAcceptII. I will add a fat comment.

I mean, something can accept it, just not the pre-pipeliner 😄

gbossu

I think I'm at a point where I understand all individual pieces, and now I need to go over the code a final time to make sure I understand how it's all connected. But I couldn't find any blocker so far, great work!

Copyright notice, namespace markings, variable casing, ...

-advance() cleared the wrong cycle -cleaner interface for reset -dumper printer required instead of reserved

We use this universally for tripcount update, which allows us to pipeline loops in both pre and postpipeliner

unittest with complicated binary operator baseline test for example broken by unnecessary WAW baseline test for ptradd/load combine with PHI node user add vmov example

This is quite an elaborate change, since it is interwoven with the fixed-point loop in interblock scheduling, The general approach is to first run loop-aware scheduling, then try to pipeline selected loops. When loop aware has converged, each fixed point iteration for the loop block will increase II until a modulo schedule can be found or it fails. When we find a modulo schedule with more than one stage, we push out bundled regions into the prologue block and the epilogue block, update the (only) region of the loop. The prologue and epilogue regions are copied without further scheduling when commiting the block schedules. The pipeliner checks min itercount to guarantee that the LC can be corrected while staying positive. There is an unconnected name change, from CurrentBlock to CurrentBlockState to avoid confusion We force a fallthrough block for loopend. This avoids missing opportunities because of 'bad' block ordering

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

gbossu · 2024-10-01T15:45:01Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

-    BS.FixPoint.NumIters++;
+  auto &BS = *CurrentBlockState;
+  switch (updateFixPoint(BS)) {
+  case SchedulingStage::SchedulingNotConverged:


I guess this is just for completeness, but we will have crashed before checking SchedulingStage::SchedulingNotConverged here?

gbossu · 2024-10-01T16:17:19Z

llvm/lib/Target/AIE/AIEMachineScheduler.cpp

+    // Try to wrap the linear schedule within II.
+    // We virtually unroll the body by the stagecount, computed from rounding
+    // up the length divided by II.
+    NCopies = (BS.getScheduleLength() + II - 1) / II;


The "standard" schedule has been achieved using different scheduling techniques. Can we guarantee there will be enough copies if the linear schedule used for pipelining is different?

gbossu · 2024-10-01T16:24:21Z

llvm/lib/Target/AIE/AIEPostPipeliner.cpp

+    int Cycle = -Depth + LocalCycle;
+    LLVM_DEBUG(dbgs() << "  Emit in " << Cycle << "\n");
+    HR.emitInScoreboard(Scoreboard, MI->getDesc(), MemoryBanks, MI->operands(),
+                        MI->getMF()->getRegInfo(), Cycle);


I'm not debating for correctness, but I wonder if we should not immediately emit the resources at C+II, C+2II, etc. Otherwise, we will only see conflicts due to resources that "wrap around the II" when scheduling the next iterations. At that point, it's too late and we will try a larger II. Maybe if we could anticipate those resources, we could find a better linear schedule that has less conflicts when scheduling all copies.

martien-de-jong requested review from abhinay-anubola, abnikant, andcarminati, gbossu, khallouh, konstantinschwarz, SagarMaheshwari99 and stephenneuendorffer as code owners August 7, 2024 13:56

martien-de-jong marked this pull request as draft August 7, 2024 13:56

gbossu reviewed Aug 8, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp Outdated Show resolved Hide resolved

gbossu reviewed Aug 8, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp Outdated Show resolved Hide resolved

martien-de-jong force-pushed the martien.postra-swp branch from 70dd0c5 to 3484bc6 Compare August 9, 2024 14:16

andcarminati mentioned this pull request Aug 12, 2024

Fix for some problems related to loops #148

Merged

andcarminati reviewed Aug 13, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2InstrInfo.cpp Show resolved Hide resolved

andcarminati reviewed Aug 13, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp Outdated Show resolved Hide resolved

andcarminati reviewed Aug 13, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEPostPipeliner.cpp Outdated Show resolved Hide resolved

gbossu mentioned this pull request Aug 14, 2024

Loop-aware scheduling for loops with no dedicated exit #152

Merged

andcarminati reviewed Aug 16, 2024

View reviewed changes

gbossu mentioned this pull request Aug 19, 2024

[AIE2] Optimize 2D/3D memory operations #145

Merged

andcarminati mentioned this pull request Aug 21, 2024

Integrating PostRA-SWP preparatory work #162

Merged

martien-de-jong force-pushed the martien.postra-swp branch from 3484bc6 to 7dacd4f Compare August 27, 2024 15:34

martien-de-jong marked this pull request as ready for review August 27, 2024 15:37

martien-de-jong force-pushed the martien.postra-swp branch from 7dacd4f to d85091f Compare September 2, 2024 16:11

gbossu reviewed Sep 5, 2024

View reviewed changes

llvm/include/llvm/CodeGen/ResourceScoreboard.h Outdated Show resolved Hide resolved

gbossu reviewed Sep 5, 2024

View reviewed changes

gbossu reviewed Sep 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEMachineScheduler.cpp Outdated Show resolved Hide resolved

gbossu reviewed Sep 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEPostPipeliner.cpp Show resolved Hide resolved

gbossu reviewed Sep 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp Outdated Show resolved Hide resolved

gbossu reviewed Sep 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp Outdated Show resolved Hide resolved

gbossu reviewed Sep 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEPostPipeliner.cpp Outdated Show resolved Hide resolved

gbossu reviewed Sep 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEPostPipeliner.cpp Show resolved Hide resolved

gbossu reviewed Sep 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEPostPipeliner.cpp Outdated Show resolved Hide resolved

gbossu reviewed Sep 27, 2024

View reviewed changes

martien-de-jong force-pushed the martien.postra-swp branch from ff27c14 to 98c0f65 Compare October 1, 2024 12:08

Martien de Jong added 8 commits October 1, 2024 16:50

[AIE] Unrelated tidying

192e763

Copyright notice, namespace markings, variable casing, ...

[AIE] Reorganize epilogue handling

d2a07ff

[AIE] Several fixes and cleanups in ResourceScoreboard

ba7fbe8

-advance() cleared the wrong cycle -cleaner interface for reset -dumper printer required instead of reserved

[AIE] Add a generic 'tripcount update' operand to LoopStart

f14e2d1

We use this universally for tripcount update, which allows us to pipeline loops in both pre and postpipeliner

[AIE] Add postpipeliner tests

8d86ba3

unittest with complicated binary operator baseline test for example broken by unnecessary WAW baseline test for ptradd/load combine with PHI node user add vmov example

[AIE2] subtract prologue cycles from 112 byte reservation

cc85180

[AIE] Reject PostPipeliner candidates in PrePipeliner

e24d89a

martien-de-jong force-pushed the martien.postra-swp branch from 98c0f65 to e24d89a Compare October 1, 2024 14:50

gbossu reviewed Oct 1, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp Show resolved Hide resolved

gbossu reviewed Oct 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post RA software pipeliner #146

Post RA software pipeliner #146

martien-de-jong commented Aug 7, 2024

andcarminati Aug 13, 2024

andcarminati Aug 13, 2024

martien-de-jong Aug 26, 2024

andcarminati Aug 13, 2024

andcarminati Aug 16, 2024

martien-de-jong Aug 27, 2024

andcarminati Aug 16, 2024

martien-de-jong Aug 26, 2024

andcarminati commented Aug 26, 2024 •

edited

Loading

gbossu Sep 5, 2024

gbossu Oct 1, 2024

gbossu Sep 27, 2024

gbossu Sep 27, 2024

gbossu Sep 27, 2024

martien-de-jong Oct 1, 2024

gbossu Sep 27, 2024

gbossu Sep 27, 2024

martien-de-jong Sep 27, 2024 •

edited

Loading

gbossu Sep 27, 2024

gbossu Sep 27, 2024

gbossu Sep 27, 2024

gbossu Sep 27, 2024

martien-de-jong Sep 27, 2024

gbossu Sep 29, 2024

martien-de-jong Sep 30, 2024

gbossu Oct 1, 2024

gbossu left a comment

gbossu Oct 1, 2024

gbossu Oct 1, 2024

gbossu Oct 1, 2024

Post RA software pipeliner #146

Are you sure you want to change the base?

Post RA software pipeliner #146

Conversation

martien-de-jong commented Aug 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati commented Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martien-de-jong Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbossu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati commented Aug 26, 2024 •

edited

Loading

martien-de-jong Sep 27, 2024 •

edited

Loading