Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Remove superfluous sxtw in peephole opt #96293

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

davemgreen
Copy link
Collaborator

Across a basic-block we might have in i32 extract from a value that only operates on upper bits (for example a sxtw). We can replace the COPY with a new version skipping the sxtw.

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 21, 2024

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

Across a basic-block we might have in i32 extract from a value that only operates on upper bits (for example a sxtw). We can replace the COPY with a new version skipping the sxtw.


Full diff: https://github.com/llvm/llvm-project/pull/96293.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp (+31)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll (+4-8)
diff --git a/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp b/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
index 22da7ddef98a2..ccd3eeb31ffbe 100644
--- a/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+++ b/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
@@ -128,6 +128,7 @@ struct AArch64MIPeepholeOpt : public MachineFunctionPass {
   bool visitINSviGPR(MachineInstr &MI, unsigned Opc);
   bool visitINSvi64lane(MachineInstr &MI);
   bool visitFMOVDr(MachineInstr &MI);
+  bool visitCopy(MachineInstr &MI);
   bool runOnMachineFunction(MachineFunction &MF) override;
 
   StringRef getPassName() const override {
@@ -690,6 +691,33 @@ bool AArch64MIPeepholeOpt::visitFMOVDr(MachineInstr &MI) {
   return true;
 }
 
+// Acrocss a basic-block we might have in i32 extract from a value that only
+// operates on upper bits (for example a sxtw). We can replace the COPY with a
+// new version skipping the sxtw.
+bool AArch64MIPeepholeOpt::visitCopy(MachineInstr &MI) {
+  if (MI.getOperand(1).getSubReg() != AArch64::sub_32 ||
+      !MRI->hasOneNonDBGUse(MI.getOperand(1).getReg()))
+    return false;
+
+  MachineInstr *SrcMI = MRI->getUniqueVRegDef(MI.getOperand(1).getReg());
+  MachineInstr *CopyMI = SrcMI;
+  if (SrcMI && SrcMI->isFullCopy() &&
+      MRI->hasOneNonDBGUse(SrcMI->getOperand(1).getReg()))
+    SrcMI = MRI->getUniqueVRegDef(SrcMI->getOperand(1).getReg());
+
+  if (!SrcMI || SrcMI->getOpcode() != AArch64::SBFMXri ||
+      SrcMI->getOperand(2).getImm() != 0 || SrcMI->getOperand(3).getImm() != 31)
+    return false;
+
+  Register SrcReg = SrcMI->getOperand(1).getReg();
+  MRI->constrainRegClass(SrcReg, MRI->getRegClass(MI.getOperand(1).getReg()));
+  MI.getOperand(1).setReg(SrcReg);
+  if (CopyMI != SrcMI)
+    CopyMI->eraseFromParent();
+  SrcMI->eraseFromParent();
+  return true;
+}
+
 bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -771,6 +799,9 @@ bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {
       case AArch64::FMOVDr:
         Changed |= visitFMOVDr(MI);
         break;
+      case AArch64::COPY:
+        Changed |= visitCopy(MI);
+        break;
       }
     }
   }
diff --git a/llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll b/llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll
index e41eb7d38c370..058cbbe9ff13c 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-mull-masks.ll
@@ -281,8 +281,7 @@ define i64 @smull_ldrsw_shift(ptr %x0, i64 %x1) {
 ; CHECK-LABEL: smull_ldrsw_shift:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    ldrsw x8, [x0]
-; CHECK-NEXT:    sxtw x9, w1
-; CHECK-NEXT:    smull x0, w8, w9
+; CHECK-NEXT:    smull x0, w8, w1
 ; CHECK-NEXT:    ret
 entry:
   %ext64 = load i32, ptr %x0
@@ -490,8 +489,7 @@ define i64 @smaddl_ldrsw_shift(ptr %x0, i64 %x1, i64 %x2) {
 ; CHECK-LABEL: smaddl_ldrsw_shift:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    ldrsw x8, [x0]
-; CHECK-NEXT:    sxtw x9, w1
-; CHECK-NEXT:    smaddl x0, w8, w9, x2
+; CHECK-NEXT:    smaddl x0, w8, w1, x2
 ; CHECK-NEXT:    ret
 entry:
   %ext64 = load i32, ptr %x0
@@ -654,8 +652,7 @@ define i64 @smnegl_ldrsw_shift(ptr %x0, i64 %x1) {
 ; CHECK-LABEL: smnegl_ldrsw_shift:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    ldrsw x8, [x0]
-; CHECK-NEXT:    sxtw x9, w1
-; CHECK-NEXT:    smnegl x0, w8, w9
+; CHECK-NEXT:    smnegl x0, w8, w1
 ; CHECK-NEXT:    ret
 entry:
   %ext64 = load i32, ptr %x0
@@ -818,8 +815,7 @@ define i64 @smsubl_ldrsw_shift(ptr %x0, i64 %x1, i64 %x2) {
 ; CHECK-LABEL: smsubl_ldrsw_shift:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    ldrsw x8, [x0]
-; CHECK-NEXT:    sxtw x9, w1
-; CHECK-NEXT:    smsubl x0, w8, w9, x2
+; CHECK-NEXT:    smsubl x0, w8, w1, x2
 ; CHECK-NEXT:    ret
 entry:
   %ext64 = load i32, ptr %x0

Copy link

github-actions bot commented Jul 1, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Across a basic-block we might have in i32 extract from a value that only
operates on upper bits (for example a sxtw). We can replace the COPY with a
new version skipping the sxtw.
@davemgreen
Copy link
Collaborator Author

Ping - thanks.

Copy link
Collaborator

@SamTebbs33 SamTebbs33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements and sorry for missing your update, looks great to me.

@davemgreen davemgreen merged commit 7f2a5df into llvm:main Jul 11, 2024
5 of 7 checks passed
@davemgreen davemgreen deleted the gh-a64-sxtwpeephole branch July 12, 2024 08:53
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
Across a basic-block we might have in i32 extract from a value that only
operates on upper bits (for example a sxtw). We can replace the COPY
with a new version skipping the sxtw.
krasimirgg added a commit that referenced this pull request Jul 17, 2024
This reverts commit 7f2a5df.

It appears that after this, llc segfaults on the following code:
```
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64--linux-eabi"

define i32 @f(i32 %0) {
entry:
  %1 = sext i32 %0 to i64
  br label %A

A:
  %2 = trunc i64 %1 to i32
  %a69.us = sub i32 0, %2
  %a69.us.fr = freeze i32 %a69.us
  %3 = zext i32 %a69.us.fr to i64
  br label %B

B:
  %t = icmp eq i64 0, %3
  br i1 %t, label %A, label %B
}
```

assert.h assertion failed at .../llvm/lib/CodeGen/LiveVariables.cpp:159 in void llvm::LiveVariables::HandleVirtRegUse(Register, MachineBasicBlock *, MachineInstr &): MRI->getVRegDef(Reg) && "Register use before def!"
@krasimirgg
Copy link
Contributor

I rolled back the patch yesterday, sorry for not leaving a comment about that here in a timely manner.

We saw some issues with this in some halide-generated code.

It appears that after this, llc segfaults on the following code:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64--linux-eabi"

define i32 @f(i32 %0) {
entry:
  %1 = sext i32 %0 to i64
  br label %A

A:
  %2 = trunc i64 %1 to i32
  %a69.us = sub i32 0, %2
  %a69.us.fr = freeze i32 %a69.us
  %3 = zext i32 %a69.us.fr to i64
  br label %B

B:
  %t = icmp eq i64 0, %3
  br i1 %t, label %A, label %B
}

assert.h assertion failed at .../llvm/lib/CodeGen/LiveVariables.cpp:159 in void llvm::LiveVariables::HandleVirtRegUse(Register, MachineBasicBlock *, MachineInstr &): MRI->getVRegDef(Reg) && "Register use before def!"

@davemgreen
Copy link
Collaborator Author

Thanks - the reproducer in the commit message was useful for figuring out what was wrong. It just needs to make sure it removed all the copies. I'll put together a recommit - please let me know if you run into any more problems, and we can revert again to make sure the branch is clean.

davemgreen added a commit that referenced this pull request Jul 19, 2024
Across a basic-block we might have an i32 extract from a value that only
operates on upper bits (for example a sxtw). We can replace the COPY
with a new version skipping the sxtw.

This is a re-commit of 7f2a5df, with a fix for
removing all the intermediate COPY nodes (and some extra debug logging).
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
This reverts commit 7f2a5df.

It appears that after this, llc segfaults on the following code:
```
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64--linux-eabi"

define i32 @f(i32 %0) {
entry:
  %1 = sext i32 %0 to i64
  br label %A

A:
  %2 = trunc i64 %1 to i32
  %a69.us = sub i32 0, %2
  %a69.us.fr = freeze i32 %a69.us
  %3 = zext i32 %a69.us.fr to i64
  br label %B

B:
  %t = icmp eq i64 0, %3
  br i1 %t, label %A, label %B
}
```

assert.h assertion failed at .../llvm/lib/CodeGen/LiveVariables.cpp:159 in void llvm::LiveVariables::HandleVirtRegUse(Register, MachineBasicBlock *, MachineInstr &): MRI->getVRegDef(Reg) && "Register use before def!"
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
Across a basic-block we might have an i32 extract from a value that only
operates on upper bits (for example a sxtw). We can replace the COPY
with a new version skipping the sxtw.

This is a re-commit of 7f2a5df, with a fix for
removing all the intermediate COPY nodes (and some extra debug logging).
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
This reverts commit 7f2a5df.

It appears that after this, llc segfaults on the following code:
```
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64--linux-eabi"

define i32 @f(i32 %0) {
entry:
  %1 = sext i32 %0 to i64
  br label %A

A:
  %2 = trunc i64 %1 to i32
  %a69.us = sub i32 0, %2
  %a69.us.fr = freeze i32 %a69.us
  %3 = zext i32 %a69.us.fr to i64
  br label %B

B:
  %t = icmp eq i64 0, %3
  br i1 %t, label %A, label %B
}
```

assert.h assertion failed at .../llvm/lib/CodeGen/LiveVariables.cpp:159 in void llvm::LiveVariables::HandleVirtRegUse(Register, MachineBasicBlock *, MachineInstr &): MRI->getVRegDef(Reg) && "Register use before def!"

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250934
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
Across a basic-block we might have an i32 extract from a value that only
operates on upper bits (for example a sxtw). We can replace the COPY
with a new version skipping the sxtw.

This is a re-commit of 7f2a5df, with a fix for
removing all the intermediate COPY nodes (and some extra debug logging).

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251396
DimitryAndric added a commit to DimitryAndric/llvm-project that referenced this pull request Jul 26, 2024
In llvm#96293 ("Remove superfluous sxtw in peephole opt") a file
`peephole-sxtw.mir` was added to `llvm/lib/Target/AArch64/`, but it
looks like this is a regression test file.

Move it to `llvm/test/CodeGen/AArch64/` which seems a more correct
location.
davemgreen pushed a commit that referenced this pull request Jul 28, 2024
…100819)

In #96293 ("Remove superfluous sxtw in peephole opt") a file
`peephole-sxtw.mir` was added to `llvm/lib/Target/AArch64/`, but it
looks like this is a regression test file.

Move it to `llvm/test/CodeGen/AArch64/` which seems a more correct
location.
banach-space pushed a commit to banach-space/llvm-project that referenced this pull request Aug 7, 2024
…lvm#100819)

In llvm#96293 ("Remove superfluous sxtw in peephole opt") a file
`peephole-sxtw.mir` was added to `llvm/lib/Target/AArch64/`, but it
looks like this is a regression test file.

Move it to `llvm/test/CodeGen/AArch64/` which seems a more correct
location.
davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Aug 15, 2024
This extends the existing sxtw peephole optimization (llvm#96293) to uxtw, which in
llvm is a ORRWrr which clears the top bits.

Fixes llvm#98481
davemgreen added a commit that referenced this pull request Aug 20, 2024
This extends the existing sxtw peephole optimization (#96293) to uxtw,
which in llvm is a ORRWrr which clears the top bits.

Fixes #98481
dmpolukhin pushed a commit to dmpolukhin/llvm-project that referenced this pull request Sep 2, 2024
This extends the existing sxtw peephole optimization (llvm#96293) to uxtw,
which in llvm is a ORRWrr which clears the top bits.

Fixes llvm#98481
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants