-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Use isKnownNonZero to optimize eligible compares to cmn and ccmn #96349
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: AtariDreams (AtariDreams) ChangesFull diff: https://github.com/llvm/llvm-project/pull/96349.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 0c834acacdca9..6b6cfe7e82a1e 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -3372,9 +3372,10 @@ static bool isLegalArithImmed(uint64_t C) {
// So, finally, the only LLVM-native comparisons that don't mention C and V
// are SETEQ and SETNE. They're the only ones we can safely use CMN for in
// the absence of information about op2.
-static bool isCMN(SDValue Op, ISD::CondCode CC) {
+static bool isCMN(SDValue Op, ISD::CondCode CC, SelectionDAG &DAG) {
return Op.getOpcode() == ISD::SUB && isNullConstant(Op.getOperand(0)) &&
- (CC == ISD::SETEQ || CC == ISD::SETNE);
+ (CC == ISD::SETEQ || CC == ISD::SETNE ||
+ DAG.isKnownNeverZero(Op.getOperand(1)));
}
static SDValue emitStrictFPComparison(SDValue LHS, SDValue RHS, const SDLoc &dl,
@@ -3419,11 +3420,11 @@ static SDValue emitComparison(SDValue LHS, SDValue RHS, ISD::CondCode CC,
// register to WZR/XZR if it ends up being unused.
unsigned Opcode = AArch64ISD::SUBS;
- if (isCMN(RHS, CC)) {
+ if (isCMN(RHS, CC, DAG)) {
// Can we combine a (CMP op1, (sub 0, op2) into a CMN instruction ?
Opcode = AArch64ISD::ADDS;
RHS = RHS.getOperand(1);
- } else if (isCMN(LHS, CC)) {
+ } else if (isCMN(LHS, CC, DAG)) {
// As we are looking for EQ/NE compares, the operands can be commuted ; can
// we combine a (CMP (sub 0, op1), op2) into a CMN instruction ?
Opcode = AArch64ISD::ADDS;
@@ -3527,7 +3528,8 @@ static SDValue emitConditionalComparison(SDValue LHS, SDValue RHS,
}
} else if (RHS.getOpcode() == ISD::SUB) {
SDValue SubOp0 = RHS.getOperand(0);
- if (isNullConstant(SubOp0) && (CC == ISD::SETEQ || CC == ISD::SETNE)) {
+ if (isNullConstant(SubOp0) && (CC == ISD::SETEQ || CC == ISD::SETNE ||
+ DAG.isKnownNeverZero(RHS.getOperand(1)))) {
// See emitComparison() on why we can only do this for SETEQ and SETNE.
Opcode = AArch64ISD::CCMN;
RHS = RHS.getOperand(1);
@@ -3848,7 +3850,7 @@ static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
// can be turned into:
// cmp w12, w11, lsl #1
if (!isa<ConstantSDNode>(RHS) || !isLegalArithImmed(RHS->getAsZExtVal())) {
- SDValue TheLHS = isCMN(LHS, CC) ? LHS.getOperand(1) : LHS;
+ SDValue TheLHS = isCMN(LHS, CC, DAG) ? LHS.getOperand(1) : LHS;
if (getCmpOperandFoldingProfit(TheLHS) > getCmpOperandFoldingProfit(RHS)) {
std::swap(LHS, RHS);
diff --git a/llvm/test/CodeGen/AArch64/cmp-chains.ll b/llvm/test/CodeGen/AArch64/cmp-chains.ll
index 14cb0c82b1c03..0713fdb7283a9 100644
--- a/llvm/test/CodeGen/AArch64/cmp-chains.ll
+++ b/llvm/test/CodeGen/AArch64/cmp-chains.ll
@@ -258,3 +258,34 @@ define i32 @neg_range_int(i32 %a, i32 %b, i32 %c) {
ret i32 %retval.0
}
+; (b > -3 || a < -(c | 1))
+define i32 @neg_range_int_cmn(i32 %a, i32 %b, i32 %c) {
+; SDISEL-LABEL: neg_range_int_cmn:
+; SDISEL: // %bb.0:
+; SDISEL-NEXT: orr w8, w2, #0x1
+; SDISEL-NEXT: cmn w8, w0
+; SDISEL-NEXT: ccmn w1, #3, #0, le
+; SDISEL-NEXT: csel w0, w1, w0, gt
+; SDISEL-NEXT: ret
+;
+; GISEL-LABEL: neg_range_int_cmn:
+; GISEL: // %bb.0:
+; GISEL-NEXT: orr w8, w2, #0x1
+; GISEL-NEXT: cmn w1, #3
+; GISEL-NEXT: neg w8, w8
+; GISEL-NEXT: cset w9, gt
+; GISEL-NEXT: cmp w8, w0
+; GISEL-NEXT: cset w8, gt
+; GISEL-NEXT: orr w8, w9, w8
+; GISEL-NEXT: and w8, w8, #0x1
+; GISEL-NEXT: tst w8, #0x1
+; GISEL-NEXT: csel w0, w1, w0, ne
+; GISEL-NEXT: ret
+ %or = or i32 %c, 1
+ %sub = sub nsw i32 0, %or
+ %cmp = icmp sgt i32 %b, -3
+ %cmp1 = icmp sgt i32 %sub, %a
+ %1 = select i1 %cmp, i1 true, i1 %cmp1
+ %ret = select i1 %1, i32 %b, i32 %a
+ ret i32 %ret
+}
|
@arsenm What do you think of this? |
@dtcxzyw does this have an adverse effect on compile time? I didn't find any when I profiled LLVM. |
2694f34
to
88dddd6
Compare
c9de2e8
to
b7b4e17
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
613c0ff
to
44a9319
Compare
74e39dd
to
643e797
Compare
@jonathandavies-arm What are your thoughts on this? |
@davemgreen This should be ready now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks OK now, and the test all look OK. Do you mind if we wait until after the release branch to commit, in case there is something still wrong?
It would be good to have test cases for lots of various cases too, even if they don't trigger - different signed conditions, cmn on the LHS/RHS, ccmn on the LHS/RHS etc. Thanks
If something goes wrong, we can revert it. However, I've tested and tested this and found no issue. |
I'm working on an expansion to this but that will sadly have to wait until after the release branch, but this is the part that has been thoroughly tested |
This isn't really the point of adding tests. |
OK - I was just being careful as we are so close to the release and some of the issues can be subtle. I gave it a more thorough test and everything seems OK. By "tested this and found no issue", does that include running a bootstrap and the llvm-test-suite? If so and you add a some extra test coverage then this LGTM. |
Yes I ran the entire test suite. |
OK. If that has succeeded and a bootstrap also does OK, then if you can add some more tests for various ccmn cases we can get this in. |
44c2883
to
e693db4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is now transforming, but I don't believe that is valid:
define i32 @and_ult_eq_s0s1_or(i32 %s0a, i32 %s1a, i32 %s2, i32 %s3) {
; CHECK-LABEL: and_ult_eq_s0s1_or:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: orr w8, w1, #0x1
; CHECK-NEXT: orr w9, w0, #0x1
; CHECK-NEXT: cmp w2, w3
; CHECK-NEXT: ccmn w8, w9, #2, eq
; CHECK-NEXT: mov w8, #20 // =0x14
; CHECK-NEXT: mov w9, #10 // =0xa
; CHECK-NEXT: csel w0, w9, w8, lo
; CHECK-NEXT: ret
entry:
%a1 = or i32 %s0a, 1
%s1 = or i32 %s1a, 1
%s0 = sub i32 0, %a1
%c0 = icmp ult i32 %s0, %s1
%c1 = icmp eq i32 %s2, %s3
%a = and i1 %c0, %c1
%r = select i1 %a, i32 10, i32 20
ret i32 %r
}
} else if (isCMN(LHS, CC, DAG) && | ||
(isIntEqualitySetCC(CC) || | ||
(isUnsignedIntSetCC(CC) && DAG.isKnownNeverZero(RHS)) || | ||
(isSignedIntSetCC(CC) && cannotBeIntMin(RHS, DAG)))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has changed gain, is that to help extra cases? It now says that it is valid so long as both the LHS and RHS have the required known bits? Do you have some proof to help explain that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proof:
It is known that if a > b, then b < a,
it is known that if b < a, then -b > -a
For the degenerative cases of INT_MIN and 0 for signed and unsigned comparisons, this is where knownbits steps in: https://alive2.llvm.org/ce/z/8YTkie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted the changed part so we can approve this. I will split the enhancement for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the and_ult_eq_s0s1_or case was mis-compiling because of
// Swap LHS and RHS if it wasn't an equality comparison
// So we don't have to worry about changing the CC
// a < b -> -b < -a
std::swap(LHS, RHS);
The condition needed to be updated too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I swapped it so that the condition didn't need to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the and_ult_eq_s0s1_or case was mis-compiling because of
// Swap LHS and RHS if it wasn't an equality comparison // So we don't have to worry about changing the CC // a < b -> -b < -a std::swap(LHS, RHS);
The condition needed to be updated too.
The condition does not need to be updated because:
a < b -> -a > -b -> -b < -a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negation of both sides flips the condition and the swap flips it back. I did this because the CC isn't used in this method except for testing for eligibility, and this is much more feasible than refactoring so that we recompute the condition codes and arm condition codes.
5d6c704
to
4371aee
Compare
… ccmn The problematic case for unsigned comparisons occurs only when the second argument is zero, and in signed cases when the second argument is the minimum possible signed integer. If we can prove the register value be those, it is safe to fold into CMN and CCMN. Source: https://devblogs.microsoft.com/oldnewthing/20210607-00/?p=105288
@davemgreen I have reverted this back because otherwise I'm gonna have to do another bootstrap test, and I think that change can be accepted after this one instead of part of this PR. So this should be good to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this version. LGTM
… ccmn (llvm#96349) The problematic case for unsigned comparisons occurs only when the second argument is zero. The problematic case for signed comparisons occurs only when the second argument is the signed minimum value. We can use KnownBits to know when we don't have to worry about this. Source: https://devblogs.microsoft.com/oldnewthing/20210607-00/?p=105288
… ccmn (#96349) Summary: The problematic case for unsigned comparisons occurs only when the second argument is zero. The problematic case for signed comparisons occurs only when the second argument is the signed minimum value. We can use KnownBits to know when we don't have to worry about this. Source: https://devblogs.microsoft.com/oldnewthing/20210607-00/?p=105288 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251645
The problematic case for unsigned comparisons occurs only when the second argument is zero.
The problematic case for signed comparisons occurs only when the second argument is the signed minimum value.
We can use KnownBits to know when we don't have to worry about this.
Source: https://devblogs.microsoft.com/oldnewthing/20210607-00/?p=105288