-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Remove superfluous sxtw in peephole opt #96293
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) ChangesAcross a basic-block we might have in i32 extract from a value that only operates on upper bits (for example a sxtw). We can replace the COPY with a new version skipping the sxtw. Full diff: https://github.com/llvm/llvm-project/pull/96293.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp b/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
index 22da7ddef98a2..ccd3eeb31ffbe 100644
--- a/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+++ b/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
@@ -128,6 +128,7 @@ struct AArch64MIPeepholeOpt : public MachineFunctionPass {
bool visitINSviGPR(MachineInstr &MI, unsigned Opc);
bool visitINSvi64lane(MachineInstr &MI);
bool visitFMOVDr(MachineInstr &MI);
+ bool visitCopy(MachineInstr &MI);
bool runOnMachineFunction(MachineFunction &MF) override;
StringRef getPassName() const override {
@@ -690,6 +691,33 @@ bool AArch64MIPeepholeOpt::visitFMOVDr(MachineInstr &MI) {
return true;
}
+// Acrocss a basic-block we might have in i32 extract from a value that only
+// operates on upper bits (for example a sxtw). We can replace the COPY with a
+// new version skipping the sxtw.
+bool AArch64MIPeepholeOpt::visitCopy(MachineInstr &MI) {
+ if (MI.getOperand(1).getSubReg() != AArch64::sub_32 ||
+ !MRI->hasOneNonDBGUse(MI.getOperand(1).getReg()))
+ return false;
+
+ MachineInstr *SrcMI = MRI->getUniqueVRegDef(MI.getOperand(1).getReg());
+ MachineInstr *CopyMI = SrcMI;
+ if (SrcMI && SrcMI->isFullCopy() &&
+ MRI->hasOneNonDBGUse(SrcMI->getOperand(1).getReg()))
+ SrcMI = MRI->getUniqueVRegDef(SrcMI->getOperand(1).getReg());
+
+ if (!SrcMI || SrcMI->getOpcode() != AArch64::SBFMXri ||
+ SrcMI->getOperand(2).getImm() != 0 || SrcMI->getOperand(3).getImm() != 31)
+ return false;
+
+ Register SrcReg = SrcMI->getOperand(1).getReg();
+ MRI->constrainRegClass(SrcReg, MRI->getRegClass(MI.getOperand(1).getReg()));
+ MI.getOperand(1).setReg(SrcReg);
+ if (CopyMI != SrcMI)
+ CopyMI->eraseFromParent();
+ SrcMI->eraseFromParent();
+ return true;
+}
+
bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))
return false;
@@ -771,6 +799,9 @@ bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {
case AArch64::FMOVDr:
Changed |= visitFMOVDr(MI);
break;
+ case AArch64::COPY:
+ Changed |= visitCopy(MI);
+ break;
}
}
}
diff --git a/llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll b/llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll
index e41eb7d38c370..058cbbe9ff13c 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll
@@ -281,8 +281,7 @@ define i64 @smull_ldrsw_shift(ptr %x0, i64 %x1) {
; CHECK-LABEL: smull_ldrsw_shift:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ldrsw x8, [x0]
-; CHECK-NEXT: sxtw x9, w1
-; CHECK-NEXT: smull x0, w8, w9
+; CHECK-NEXT: smull x0, w8, w1
; CHECK-NEXT: ret
entry:
%ext64 = load i32, ptr %x0
@@ -490,8 +489,7 @@ define i64 @smaddl_ldrsw_shift(ptr %x0, i64 %x1, i64 %x2) {
; CHECK-LABEL: smaddl_ldrsw_shift:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ldrsw x8, [x0]
-; CHECK-NEXT: sxtw x9, w1
-; CHECK-NEXT: smaddl x0, w8, w9, x2
+; CHECK-NEXT: smaddl x0, w8, w1, x2
; CHECK-NEXT: ret
entry:
%ext64 = load i32, ptr %x0
@@ -654,8 +652,7 @@ define i64 @smnegl_ldrsw_shift(ptr %x0, i64 %x1) {
; CHECK-LABEL: smnegl_ldrsw_shift:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ldrsw x8, [x0]
-; CHECK-NEXT: sxtw x9, w1
-; CHECK-NEXT: smnegl x0, w8, w9
+; CHECK-NEXT: smnegl x0, w8, w1
; CHECK-NEXT: ret
entry:
%ext64 = load i32, ptr %x0
@@ -818,8 +815,7 @@ define i64 @smsubl_ldrsw_shift(ptr %x0, i64 %x1, i64 %x2) {
; CHECK-LABEL: smsubl_ldrsw_shift:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ldrsw x8, [x0]
-; CHECK-NEXT: sxtw x9, w1
-; CHECK-NEXT: smsubl x0, w8, w9, x2
+; CHECK-NEXT: smsubl x0, w8, w1, x2
; CHECK-NEXT: ret
entry:
%ext64 = load i32, ptr %x0
|
2cf6269
to
a2d6de8
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
Across a basic-block we might have in i32 extract from a value that only operates on upper bits (for example a sxtw). We can replace the COPY with a new version skipping the sxtw.
a2d6de8
to
46bd1a6
Compare
Ping - thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements and sorry for missing your update, looks great to me.
Across a basic-block we might have in i32 extract from a value that only operates on upper bits (for example a sxtw). We can replace the COPY with a new version skipping the sxtw.
This reverts commit 7f2a5df. It appears that after this, llc segfaults on the following code: ``` target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32" target triple = "aarch64--linux-eabi" define i32 @f(i32 %0) { entry: %1 = sext i32 %0 to i64 br label %A A: %2 = trunc i64 %1 to i32 %a69.us = sub i32 0, %2 %a69.us.fr = freeze i32 %a69.us %3 = zext i32 %a69.us.fr to i64 br label %B B: %t = icmp eq i64 0, %3 br i1 %t, label %A, label %B } ``` assert.h assertion failed at .../llvm/lib/CodeGen/LiveVariables.cpp:159 in void llvm::LiveVariables::HandleVirtRegUse(Register, MachineBasicBlock *, MachineInstr &): MRI->getVRegDef(Reg) && "Register use before def!"
I rolled back the patch yesterday, sorry for not leaving a comment about that here in a timely manner. We saw some issues with this in some halide-generated code. It appears that after this, llc segfaults on the following code:
|
Thanks - the reproducer in the commit message was useful for figuring out what was wrong. It just needs to make sure it removed all the copies. I'll put together a recommit - please let me know if you run into any more problems, and we can revert again to make sure the branch is clean. |
Across a basic-block we might have an i32 extract from a value that only operates on upper bits (for example a sxtw). We can replace the COPY with a new version skipping the sxtw. This is a re-commit of 7f2a5df, with a fix for removing all the intermediate COPY nodes (and some extra debug logging).
This reverts commit 7f2a5df. It appears that after this, llc segfaults on the following code: ``` target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32" target triple = "aarch64--linux-eabi" define i32 @f(i32 %0) { entry: %1 = sext i32 %0 to i64 br label %A A: %2 = trunc i64 %1 to i32 %a69.us = sub i32 0, %2 %a69.us.fr = freeze i32 %a69.us %3 = zext i32 %a69.us.fr to i64 br label %B B: %t = icmp eq i64 0, %3 br i1 %t, label %A, label %B } ``` assert.h assertion failed at .../llvm/lib/CodeGen/LiveVariables.cpp:159 in void llvm::LiveVariables::HandleVirtRegUse(Register, MachineBasicBlock *, MachineInstr &): MRI->getVRegDef(Reg) && "Register use before def!"
Across a basic-block we might have an i32 extract from a value that only operates on upper bits (for example a sxtw). We can replace the COPY with a new version skipping the sxtw. This is a re-commit of 7f2a5df, with a fix for removing all the intermediate COPY nodes (and some extra debug logging).
Summary: This reverts commit 7f2a5df. It appears that after this, llc segfaults on the following code: ``` target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32" target triple = "aarch64--linux-eabi" define i32 @f(i32 %0) { entry: %1 = sext i32 %0 to i64 br label %A A: %2 = trunc i64 %1 to i32 %a69.us = sub i32 0, %2 %a69.us.fr = freeze i32 %a69.us %3 = zext i32 %a69.us.fr to i64 br label %B B: %t = icmp eq i64 0, %3 br i1 %t, label %A, label %B } ``` assert.h assertion failed at .../llvm/lib/CodeGen/LiveVariables.cpp:159 in void llvm::LiveVariables::HandleVirtRegUse(Register, MachineBasicBlock *, MachineInstr &): MRI->getVRegDef(Reg) && "Register use before def!" Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250934
Summary: Across a basic-block we might have an i32 extract from a value that only operates on upper bits (for example a sxtw). We can replace the COPY with a new version skipping the sxtw. This is a re-commit of 7f2a5df, with a fix for removing all the intermediate COPY nodes (and some extra debug logging). Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251396
In llvm#96293 ("Remove superfluous sxtw in peephole opt") a file `peephole-sxtw.mir` was added to `llvm/lib/Target/AArch64/`, but it looks like this is a regression test file. Move it to `llvm/test/CodeGen/AArch64/` which seems a more correct location.
…lvm#100819) In llvm#96293 ("Remove superfluous sxtw in peephole opt") a file `peephole-sxtw.mir` was added to `llvm/lib/Target/AArch64/`, but it looks like this is a regression test file. Move it to `llvm/test/CodeGen/AArch64/` which seems a more correct location.
This extends the existing sxtw peephole optimization (llvm#96293) to uxtw, which in llvm is a ORRWrr which clears the top bits. Fixes llvm#98481
This extends the existing sxtw peephole optimization (llvm#96293) to uxtw, which in llvm is a ORRWrr which clears the top bits. Fixes llvm#98481
Across a basic-block we might have in i32 extract from a value that only operates on upper bits (for example a sxtw). We can replace the COPY with a new version skipping the sxtw.