[DebugInfo][LoopStrengthReduce] Fix missing debug location updates #97519

Apochens · 2024-07-03T04:37:50Z

Note that, for the new phi instruction NewPH, which replaces the old phi PH and the cast ShadowUse, I choose to propagate the debug location of PH to it, because the cast is eliminated according to the optimization semantics.

BTW, after patching the debug location update of NewPH, I found that in the optimized IR, another phi instruction %lsr.iv has the same debug location as %IV.S. (NewPH). However, %lsr.iv should inherit the debug location of %iv in the unoptimized program, i.e., "line: 3" instead of "line: 2". The same thing happens to %lsr.iv.next, which has the same debug location as %exitcond. Yet, %lsr.iv.next should inherit the debug location of %iv.next, i.e., "line: 7" instead of "line: 8". The program before and after LSR are shown below.

Before optimization:

define i32 @foobar6() !dbg !5 {
entry:
  br label %loop, !dbg !8

loop:                                             ; preds = %loop, %entry
  %accum = phi i32 [ -3220, %entry ], [ %accum.next, %loop ], !dbg !9
->%iv = phi i32 [ 12, %entry ], [ %iv.next, %loop ], !dbg !10
  %tmp1 = sitofp i32 %accum to double, !dbg !11
  tail call void @foo(double %tmp1), !dbg !12
  %accum.next = add nsw i32 %accum, 9597741, !dbg !13
->%iv.next = add nuw nsw i32 %iv, 1, !dbg !14
  %exitcond = icmp ugt i32 %iv, 235, !dbg !15
  br i1 %exitcond, label %exit, label %loop, !dbg !16

exit:                                             ; preds = %loop
  ret i32 %accum.next, !dbg !17
}

...
!9 = !DILocation(line: 2, column: 1, scope: !5)
!10 = !DILocation(line: 3, column: 1, scope: !5)
...
!14 = !DILocation(line: 7, column: 1, scope: !5)
!15 = !DILocation(line: 8, column: 1, scope: !5)
...

After optimization:

define i32 @foobar6() !dbg !5 {
entry:
  br label %loop, !dbg !8

loop:                                             ; preds = %loop, %entry
->%lsr.iv = phi i32 [ %lsr.iv.next, %loop ], [ 11, %entry ], !dbg !9
  %IV.S. = phi double [ -3.220000e+03, %entry ], [ %IV.S.next., %loop ], !dbg !9
  tail call void @foo(double %IV.S.) #0, !dbg !10
  %IV.S.next. = fadd double %IV.S., 0x41624E65A0000000, !dbg !11
->%lsr.iv.next = add nuw nsw i32 %lsr.iv, 1, !dbg !12
  %exitcond = icmp ugt i32 %lsr.iv.next, 235, !dbg !12
  br i1 %exitcond, label %exit, label %loop, !dbg !13

exit:                                             ; preds = %loop
  ret i32 -2135478791, !dbg !14
}

...
!9 = !DILocation(line: 2, column: 1, scope: !5)
...
!12 = !DILocation(line: 8, column: 1, scope: !5)
...

I guess it's caused by inserting new instructions with IRBuilder, in which the next instructions are specified as the insertion points. (Yet, I haven't found the corresponding code.🧐)

llvmbot · 2024-07-03T04:38:19Z

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-llvm-transforms

Author: Shan Huang (Apochens)

Changes

Fix #97510 .

Note that, for the new phi instruction NewPH, which replaces the old phi PH and the cast ShadowUse, I choose to propagate the debug location of PH to it, because the cast is eliminated according to the optimization semantics.

BTW, after patching the debug location update of NewPH, I found that in the optimized IR, another phi instruction %lsr.iv has the same debug location as %IV.S. (NewPH). However, %lsr.iv should inherent the debug location of %iv in the unoptimized program, i.e., "line: 3" instead of "line: 2". The same thing happens to %lsr.iv.next, which has the same debug location as %exitcond. Yet, %lsr.iv.next should inherent the debug location of %iv.next, i.e., "line: 7" instead of "line: 8". The program before and after LSR are shown below.

Before optimization:

define i32 @<!-- -->foobar6() !dbg !5 {
entry:
  br label %loop, !dbg !8

loop:                                             ; preds = %loop, %entry
  %accum = phi i32 [ -3220, %entry ], [ %accum.next, %loop ], !dbg !9
  %iv = phi i32 [ 12, %entry ], [ %iv.next, %loop ], !dbg !10
  %tmp1 = sitofp i32 %accum to double, !dbg !11
  tail call void @<!-- -->foo(double %tmp1), !dbg !12
  %accum.next = add nsw i32 %accum, 9597741, !dbg !13
  %iv.next = add nuw nsw i32 %iv, 1, !dbg !14
  %exitcond = icmp ugt i32 %iv, 235, !dbg !15
  br i1 %exitcond, label %exit, label %loop, !dbg !16

exit:                                             ; preds = %loop
  ret i32 %accum.next, !dbg !17
}

...
!9 = !DILocation(line: 2, column: 1, scope: !5)
!10 = !DILocation(line: 3, column: 1, scope: !5)
...
!14 = !DILocation(line: 7, column: 1, scope: !5)
!15 = !DILocation(line: 8, column: 1, scope: !5)
...

After optimization:

define i32 @<!-- -->foobar6() !dbg !5 {
entry:
  br label %loop, !dbg !8

loop:                                             ; preds = %loop, %entry
-&gt;%lsr.iv = phi i32 [ %lsr.iv.next, %loop ], [ 11, %entry ], !dbg !9
  %IV.S. = phi double [ -3.220000e+03, %entry ], [ %IV.S.next., %loop ], !dbg !9
  tail call void @<!-- -->foo(double %IV.S.) #<!-- -->0, !dbg !10
  %IV.S.next. = fadd double %IV.S., 0x41624E65A0000000, !dbg !11
-&gt;%lsr.iv.next = add nuw nsw i32 %lsr.iv, 1, !dbg !12
  %exitcond = icmp ugt i32 %lsr.iv.next, 235, !dbg !12
  br i1 %exitcond, label %exit, label %loop, !dbg !13

exit:                                             ; preds = %loop
  ret i32 -2135478791, !dbg !14
}

...
!9 = !DILocation(line: 2, column: 1, scope: !5)
...
!12 = !DILocation(line: 8, column: 1, scope: !5)
...

I guess it's caused by inserting new instructions with IRBuilder, in which the next instructions are specified as the insertion points. (Yet, I haven't found the corresponding code.🧐)

Full diff: https://github.com/llvm/llvm-project/pull/97519.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp (+2)
(added) llvm/test/Transforms/LoopStrengthReduce/X86/preserving-debugloc-phi-binop.ll (+59)

diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index 1c4a0d92dcde9..8ebead8b8fbd7 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -2404,6 +2404,7 @@ void LSRInstance::OptimizeShadowIV() {
 
     /* Add new PHINode. */
     PHINode *NewPH = PHINode::Create(DestTy, 2, "IV.S.", PH->getIterator());
+    NewPH->setDebugLoc(PH->getDebugLoc());
 
     /* create new increment. '++d' in above example. */
     Constant *CFP = ConstantFP::get(DestTy, C->getZExtValue());
@@ -2411,6 +2412,7 @@ void LSRInstance::OptimizeShadowIV() {
         Incr->getOpcode() == Instruction::Add ? Instruction::FAdd
                                               : Instruction::FSub,
         NewPH, CFP, "IV.S.next.", Incr->getIterator());
+    NewIncr->setDebugLoc(Incr->getDebugLoc());
 
     NewPH->addIncoming(NewInit, PH->getIncomingBlock(Entry));
     NewPH->addIncoming(NewIncr, PH->getIncomingBlock(Latch));
diff --git a/llvm/test/Transforms/LoopStrengthReduce/X86/preserving-debugloc-phi-binop.ll b/llvm/test/Transforms/LoopStrengthReduce/X86/preserving-debugloc-phi-binop.ll
new file mode 100644
index 0000000000000..1aecd34082a2a
--- /dev/null
+++ b/llvm/test/Transforms/LoopStrengthReduce/X86/preserving-debugloc-phi-binop.ll
@@ -0,0 +1,59 @@
+; RUN: opt -S -passes=loop-reduce -mtriple=x86_64-unknown-unknown < %s | FileCheck %s
+
+; Check that LoopStrengthReduce's OptimizeShadowIV() propagates the debug
+; locations of the old phi (`%accum`) and binop (`%accum.next`) instruction
+; to the new phi and binop instruction, respectively.
+
+target datalayout = "n8:16:32:64"
+
+define i32 @foobar6() !dbg !5 {
+; CHECK-LABEL: define i32 @foobar6(
+; CHECK:  loop:
+; CHECK:    [[IV_S_:%.*]] = phi double [ -3.220000e+03, %[[ENTRY:.*]] ], [ [[IV_S_NEXT_:%.*]], %loop ], !dbg [[DBG9:![0-9]+]]
+; CHECK:    [[IV_S_NEXT_]] = fadd double [[IV_S_]], 0x41624E65A0000000, !dbg [[DBG11:![0-9]+]]
+; CHECK:  exit:
+;
+entry:
+  br label %loop, !dbg !8
+
+loop:                                             ; preds = %loop, %entry
+  %accum = phi i32 [ -3220, %entry ], [ %accum.next, %loop ], !dbg !9
+  %iv = phi i32 [ 12, %entry ], [ %iv.next, %loop ], !dbg !10
+  %tmp1 = sitofp i32 %accum to double, !dbg !11
+  tail call void @foo(double %tmp1), !dbg !12
+  %accum.next = add nsw i32 %accum, 9597741, !dbg !13
+  %iv.next = add nuw nsw i32 %iv, 1, !dbg !14
+  %exitcond = icmp ugt i32 %iv, 235, !dbg !15
+  br i1 %exitcond, label %exit, label %loop, !dbg !16
+
+exit:                                             ; preds = %loop
+  ret i32 %accum.next, !dbg !17
+}
+
+declare void @foo(double)
+
+!llvm.dbg.cu = !{!0}
+!llvm.debugify = !{!2, !3}
+!llvm.module.flags = !{!4}
+
+; CHECK: [[DBG9]] = !DILocation(line: 2,
+; CHECK: [[DBG11]] = !DILocation(line: 6,
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
+!1 = !DIFile(filename: "main.ll", directory: "/")
+!2 = !{i32 10}
+!3 = !{i32 0}
+!4 = !{i32 2, !"Debug Info Version", i32 3}
+!5 = distinct !DISubprogram(name: "foobar6", linkageName: "foobar6", scope: null, file: !1, line: 1, type: !6, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
+!6 = !DISubroutineType(types: !7)
+!7 = !{}
+!8 = !DILocation(line: 1, column: 1, scope: !5)
+!9 = !DILocation(line: 2, column: 1, scope: !5)
+!10 = !DILocation(line: 3, column: 1, scope: !5)
+!11 = !DILocation(line: 4, column: 1, scope: !5)
+!12 = !DILocation(line: 5, column: 1, scope: !5)
+!13 = !DILocation(line: 6, column: 1, scope: !5)
+!14 = !DILocation(line: 7, column: 1, scope: !5)
+!15 = !DILocation(line: 8, column: 1, scope: !5)
+!16 = !DILocation(line: 9, column: 1, scope: !5)
+!17 = !DILocation(line: 10, column: 1, scope: !5)

jmorse

Patch LGTM, @OCHyams @SLTozer have reviewed more of these, are there any more concerns than out-of-place source locations and potential loss of coverage?

Instructions inheriting the adjacent source location is technically part of LLVMs design, I think the difficulty is that it works well 99.9% of the time, but occasionally you need more finesse when picking them. The current solution is just a very good approximation of ideal source location placement (as far as I'm aware).

OCHyams · 2024-07-08T09:35:50Z

Note that, for the new phi instruction NewPH, which replaces the old phi PH and the cast ShadowUse, I choose to propagate the debug location of PH to it, because the cast is eliminated according to the optimization semantics.

SGTM - we're replacing two instructions (phi and add) in order to make one redundant (cast), it makes sense that we propagate the locations of the replaced instructions.

@OCHyams @SLTozer have reviewed more of these, are there any more concerns than out-of-place source locations and potential loss of coverage?

No concern for this patch, LGTM too.

Thanks!

fix missing debug location updates

c557fc2

llvmbot added the llvm:transforms label Jul 3, 2024

Apochens added the debuginfo label Jul 3, 2024

Apochens requested review from jryans, SLTozer and OCHyams July 3, 2024 04:39

jmorse approved these changes Jul 5, 2024

View reviewed changes

Apochens merged commit d83d09f into llvm:main Jul 15, 2024
10 checks passed

Apochens deleted the #97510_lsr_preserving_debugloc_phi_binop branch July 15, 2024 01:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DebugInfo][LoopStrengthReduce] Fix missing debug location updates #97519

[DebugInfo][LoopStrengthReduce] Fix missing debug location updates #97519

Apochens commented Jul 3, 2024 •

edited

Loading

llvmbot commented Jul 3, 2024 •

edited

Loading

jmorse left a comment

OCHyams commented Jul 8, 2024 •

edited

Loading

[DebugInfo][LoopStrengthReduce] Fix missing debug location updates #97519

[DebugInfo][LoopStrengthReduce] Fix missing debug location updates #97519

Conversation

Apochens commented Jul 3, 2024 • edited Loading

llvmbot commented Jul 3, 2024 • edited Loading

jmorse left a comment

Choose a reason for hiding this comment

OCHyams commented Jul 8, 2024 • edited Loading

Apochens commented Jul 3, 2024 •

edited

Loading

llvmbot commented Jul 3, 2024 •

edited

Loading

OCHyams commented Jul 8, 2024 •

edited

Loading