Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Use isKnownNonZero to optimize eligible compares to cmn and ccmn #96349

Merged
merged 2 commits into from
Jul 22, 2024

Conversation

AreaZR
Copy link
Contributor

@AreaZR AreaZR commented Jun 21, 2024

The problematic case for unsigned comparisons occurs only when the second argument is zero.

The problematic case for signed comparisons occurs only when the second argument is the signed minimum value.

We can use KnownBits to know when we don't have to worry about this.

Source: https://devblogs.microsoft.com/oldnewthing/20210607-00/?p=105288

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 21, 2024

@llvm/pr-subscribers-backend-aarch64

Author: AtariDreams (AtariDreams)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/96349.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+8-6)
  • (modified) llvm/test/CodeGen/AArch64/cmp-chains.ll (+31)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 0c834acacdca9..6b6cfe7e82a1e 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -3372,9 +3372,10 @@ static bool isLegalArithImmed(uint64_t C) {
 // So, finally, the only LLVM-native comparisons that don't mention C and V
 // are SETEQ and SETNE. They're the only ones we can safely use CMN for in
 // the absence of information about op2.
-static bool isCMN(SDValue Op, ISD::CondCode CC) {
+static bool isCMN(SDValue Op, ISD::CondCode CC, SelectionDAG &DAG) {
   return Op.getOpcode() == ISD::SUB && isNullConstant(Op.getOperand(0)) &&
-         (CC == ISD::SETEQ || CC == ISD::SETNE);
+         (CC == ISD::SETEQ || CC == ISD::SETNE ||
+          DAG.isKnownNeverZero(Op.getOperand(1)));
 }
 
 static SDValue emitStrictFPComparison(SDValue LHS, SDValue RHS, const SDLoc &dl,
@@ -3419,11 +3420,11 @@ static SDValue emitComparison(SDValue LHS, SDValue RHS, ISD::CondCode CC,
   // register to WZR/XZR if it ends up being unused.
   unsigned Opcode = AArch64ISD::SUBS;
 
-  if (isCMN(RHS, CC)) {
+  if (isCMN(RHS, CC, DAG)) {
     // Can we combine a (CMP op1, (sub 0, op2) into a CMN instruction ?
     Opcode = AArch64ISD::ADDS;
     RHS = RHS.getOperand(1);
-  } else if (isCMN(LHS, CC)) {
+  } else if (isCMN(LHS, CC, DAG)) {
     // As we are looking for EQ/NE compares, the operands can be commuted ; can
     // we combine a (CMP (sub 0, op1), op2) into a CMN instruction ?
     Opcode = AArch64ISD::ADDS;
@@ -3527,7 +3528,8 @@ static SDValue emitConditionalComparison(SDValue LHS, SDValue RHS,
     }
   } else if (RHS.getOpcode() == ISD::SUB) {
     SDValue SubOp0 = RHS.getOperand(0);
-    if (isNullConstant(SubOp0) && (CC == ISD::SETEQ || CC == ISD::SETNE)) {
+    if (isNullConstant(SubOp0) && (CC == ISD::SETEQ || CC == ISD::SETNE ||
+                                   DAG.isKnownNeverZero(RHS.getOperand(1)))) {
       // See emitComparison() on why we can only do this for SETEQ and SETNE.
       Opcode = AArch64ISD::CCMN;
       RHS = RHS.getOperand(1);
@@ -3848,7 +3850,7 @@ static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
   // can be turned into:
   //    cmp     w12, w11, lsl #1
   if (!isa<ConstantSDNode>(RHS) || !isLegalArithImmed(RHS->getAsZExtVal())) {
-    SDValue TheLHS = isCMN(LHS, CC) ? LHS.getOperand(1) : LHS;
+    SDValue TheLHS = isCMN(LHS, CC, DAG) ? LHS.getOperand(1) : LHS;
 
     if (getCmpOperandFoldingProfit(TheLHS) > getCmpOperandFoldingProfit(RHS)) {
       std::swap(LHS, RHS);
diff --git a/llvm/test/CodeGen/AArch64/cmp-chains.ll b/llvm/test/CodeGen/AArch64/cmp-chains.ll
index 14cb0c82b1c03..0713fdb7283a9 100644
--- a/llvm/test/CodeGen/AArch64/cmp-chains.ll
+++ b/llvm/test/CodeGen/AArch64/cmp-chains.ll
@@ -258,3 +258,34 @@ define i32 @neg_range_int(i32 %a, i32 %b, i32 %c) {
   ret i32 %retval.0
 }
 
+; (b > -3 || a < -(c | 1))
+define i32 @neg_range_int_cmn(i32 %a, i32 %b, i32 %c) {
+; SDISEL-LABEL: neg_range_int_cmn:
+; SDISEL:       // %bb.0:
+; SDISEL-NEXT:    orr w8, w2, #0x1
+; SDISEL-NEXT:    cmn w8, w0
+; SDISEL-NEXT:    ccmn w1, #3, #0, le
+; SDISEL-NEXT:    csel w0, w1, w0, gt
+; SDISEL-NEXT:    ret
+;
+; GISEL-LABEL: neg_range_int_cmn:
+; GISEL:       // %bb.0:
+; GISEL-NEXT:    orr w8, w2, #0x1
+; GISEL-NEXT:    cmn w1, #3
+; GISEL-NEXT:    neg w8, w8
+; GISEL-NEXT:    cset w9, gt
+; GISEL-NEXT:    cmp w8, w0
+; GISEL-NEXT:    cset w8, gt
+; GISEL-NEXT:    orr w8, w9, w8
+; GISEL-NEXT:    and w8, w8, #0x1
+; GISEL-NEXT:    tst w8, #0x1
+; GISEL-NEXT:    csel w0, w1, w0, ne
+; GISEL-NEXT:    ret
+  %or = or i32 %c, 1
+  %sub = sub nsw i32 0, %or
+  %cmp = icmp sgt i32 %b, -3
+  %cmp1 = icmp sgt i32 %sub, %a
+  %1 = select i1 %cmp, i1 true, i1 %cmp1
+  %ret = select i1 %1, i32 %b, i32 %a
+  ret i32 %ret
+}

@AreaZR AreaZR changed the title [AArch64] Use isKnownNonZero to optimize to cmn instead of cmp [AArch64] Use isKnownNonZero to optimize to cmn Jun 21, 2024
@AreaZR AreaZR changed the title [AArch64] Use isKnownNonZero to optimize to cmn [AArch64] Use isKnownNonZero to optimize eligible compares to cmn Jun 21, 2024
@AreaZR
Copy link
Contributor Author

AreaZR commented Jun 23, 2024

@arsenm What do you think of this?

@AreaZR
Copy link
Contributor Author

AreaZR commented Jun 23, 2024

@dtcxzyw does this have an adverse effect on compile time? I didn't find any when I profiled LLVM.

@AreaZR AreaZR requested a review from davemgreen June 28, 2024 00:56
@AreaZR AreaZR force-pushed the reg branch 2 times, most recently from 2694f34 to 88dddd6 Compare June 28, 2024 14:00
@AreaZR AreaZR marked this pull request as draft June 28, 2024 14:03
@AreaZR AreaZR force-pushed the reg branch 8 times, most recently from c9de2e8 to b7b4e17 Compare June 29, 2024 02:59
@AreaZR AreaZR marked this pull request as ready for review June 29, 2024 03:09
Copy link

github-actions bot commented Jun 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@AreaZR AreaZR force-pushed the reg branch 6 times, most recently from 613c0ff to 44a9319 Compare June 30, 2024 17:46
@AreaZR AreaZR force-pushed the reg branch 2 times, most recently from 74e39dd to 643e797 Compare July 16, 2024 20:31
@AreaZR
Copy link
Contributor Author

AreaZR commented Jul 16, 2024

@jonathandavies-arm What are your thoughts on this?

@AreaZR
Copy link
Contributor Author

AreaZR commented Jul 19, 2024

@davemgreen This should be ready now!

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks OK now, and the test all look OK. Do you mind if we wait until after the release branch to commit, in case there is something still wrong?

It would be good to have test cases for lots of various cases too, even if they don't trigger - different signed conditions, cmn on the LHS/RHS, ccmn on the LHS/RHS etc. Thanks

@AreaZR
Copy link
Contributor Author

AreaZR commented Jul 19, 2024

I think this looks OK now, and the test all look OK. Do you mind if we wait until after the release branch to commit, in case there is something still wrong?

It would be good to have test cases for lots of various cases too, even if they don't trigger - different signed conditions, cmn on the LHS/RHS, ccmn on the LHS/RHS etc. Thanks

If something goes wrong, we can revert it. However, I've tested and tested this and found no issue.

@AreaZR
Copy link
Contributor Author

AreaZR commented Jul 19, 2024

I'm working on an expansion to this but that will sadly have to wait until after the release branch, but this is the part that has been thoroughly tested

@arsenm
Copy link
Contributor

arsenm commented Jul 19, 2024

If something goes wrong, we can revert it. However, I've tested and tested this and found no issue.

This isn't really the point of adding tests.

@davemgreen
Copy link
Collaborator

OK - I was just being careful as we are so close to the release and some of the issues can be subtle. I gave it a more thorough test and everything seems OK.

By "tested this and found no issue", does that include running a bootstrap and the llvm-test-suite? If so and you add a some extra test coverage then this LGTM.

@AreaZR
Copy link
Contributor Author

AreaZR commented Jul 19, 2024

OK - I was just being careful as we are so close to the release and some of the issues can be subtle. I gave it a more thorough test and everything seems OK.

By "tested this and found no issue", does that include running a bootstrap and the llvm-test-suite? If so and you add a some extra test coverage then this LGTM.

Yes I ran the entire test suite.

@davemgreen
Copy link
Collaborator

OK. If that has succeeded and a bootstrap also does OK, then if you can add some more tests for various ccmn cases we can get this in.

@AreaZR AreaZR force-pushed the reg branch 2 times, most recently from 44c2883 to e693db4 Compare July 20, 2024 17:19
@AreaZR AreaZR requested a review from davemgreen July 20, 2024 17:33
@AreaZR AreaZR changed the title [AArch64] Use isKnownNonZero to optimize eligible compares to cmn [AArch64] Use isKnownNonZero to optimize eligible compares to cmn and ccmn Jul 20, 2024
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is now transforming, but I don't believe that is valid:

define i32 @and_ult_eq_s0s1_or(i32 %s0a, i32 %s1a, i32 %s2, i32 %s3) {
; CHECK-LABEL: and_ult_eq_s0s1_or:
; CHECK:       // %bb.0: // %entry
; CHECK-NEXT:    orr w8, w1, #0x1
; CHECK-NEXT:    orr w9, w0, #0x1
; CHECK-NEXT:    cmp w2, w3
; CHECK-NEXT:    ccmn w8, w9, #2, eq
; CHECK-NEXT:    mov w8, #20 // =0x14
; CHECK-NEXT:    mov w9, #10 // =0xa
; CHECK-NEXT:    csel w0, w9, w8, lo
; CHECK-NEXT:    ret
entry:
  %a1 = or i32 %s0a, 1
  %s1 = or i32 %s1a, 1
  %s0 = sub i32 0, %a1
  %c0 = icmp ult i32 %s0, %s1
  %c1 = icmp eq i32 %s2, %s3
  %a = and i1 %c0, %c1
  %r = select i1 %a, i32 10, i32 20
  ret i32 %r
}

Comment on lines 3580 to 3583
} else if (isCMN(LHS, CC, DAG) &&
(isIntEqualitySetCC(CC) ||
(isUnsignedIntSetCC(CC) && DAG.isKnownNeverZero(RHS)) ||
(isSignedIntSetCC(CC) && cannotBeIntMin(RHS, DAG)))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has changed gain, is that to help extra cases? It now says that it is valid so long as both the LHS and RHS have the required known bits? Do you have some proof to help explain that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof:

It is known that if a > b, then b < a,

it is known that if b < a, then -b > -a

For the degenerative cases of INT_MIN and 0 for signed and unsigned comparisons, this is where knownbits steps in: https://alive2.llvm.org/ce/z/8YTkie

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the changed part so we can approve this. I will split the enhancement for another PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the and_ult_eq_s0s1_or case was mis-compiling because of

    // Swap LHS and RHS if it wasn't an equality comparison
    // So we don't have to worry about changing the CC
    // a < b -> -b < -a
    std::swap(LHS, RHS);

The condition needed to be updated too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swapped it so that the condition didn't need to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the and_ult_eq_s0s1_or case was mis-compiling because of

    // Swap LHS and RHS if it wasn't an equality comparison
    // So we don't have to worry about changing the CC
    // a < b -> -b < -a
    std::swap(LHS, RHS);

The condition needed to be updated too.

The condition does not need to be updated because:

a < b -> -a > -b -> -b < -a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negation of both sides flips the condition and the swap flips it back. I did this because the CC isn't used in this method except for testing for eligibility, and this is much more feasible than refactoring so that we recompute the condition codes and arm condition codes.

@AreaZR AreaZR requested a review from davemgreen July 21, 2024 15:25
@AreaZR AreaZR force-pushed the reg branch 2 times, most recently from 5d6c704 to 4371aee Compare July 21, 2024 15:30
… ccmn

The problematic case for unsigned comparisons occurs only when the second argument is zero, and in signed cases when the second argument is the minimum possible signed integer. If we can prove the register value be those, it is safe to fold into CMN and CCMN.

Source: https://devblogs.microsoft.com/oldnewthing/20210607-00/?p=105288
@AreaZR
Copy link
Contributor Author

AreaZR commented Jul 22, 2024

@davemgreen I have reverted this back because otherwise I'm gonna have to do another bootstrap test, and I think that change can be accepted after this one instead of part of this PR.

So this should be good to go!

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this version. LGTM

@davemgreen davemgreen merged commit 16e0591 into llvm:main Jul 22, 2024
7 checks passed
@AreaZR AreaZR deleted the reg branch July 22, 2024 12:10
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
… ccmn (llvm#96349)

The problematic case for unsigned comparisons occurs only when the
second argument is zero.

The problematic case for signed comparisons occurs only when the second
argument is the signed minimum value.

We can use KnownBits to know when we don't have to worry about this.

Source: https://devblogs.microsoft.com/oldnewthing/20210607-00/?p=105288
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
… ccmn (#96349)

Summary:
The problematic case for unsigned comparisons occurs only when the
second argument is zero.

The problematic case for signed comparisons occurs only when the second
argument is the signed minimum value.

We can use KnownBits to know when we don't have to worry about this.

Source: https://devblogs.microsoft.com/oldnewthing/20210607-00/?p=105288

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251645
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants