-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VectorCombine] Add Cmp and Select for shuffleToIdentity #92794
Conversation
@llvm/pr-subscribers-llvm-transforms Author: David Green (davemgreen) ChangesOther than some additional checks needed for compare predicates and selects with scalar condition operands, these are relatively simple additions to what already exists. I will rebase over #92766, but already had the patch for this version. Full diff: https://github.com/llvm/llvm-project/pull/92794.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 15deaf908422d..981e17fa9aa2c 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -1757,6 +1757,13 @@ bool VectorCombine::foldShuffleToIdentity(Instruction &I) {
return false;
if (IL.first->getValueID() != Item[0].first->getValueID())
return false;
+ if (auto *CI = dyn_cast<CmpInst>(IL.first))
+ if (CI->getPredicate() !=
+ cast<CmpInst>(Item[0].first)->getPredicate())
+ return false;
+ if (auto *SI = dyn_cast<SelectInst>(IL.first))
+ if (!isa<VectorType>(SI->getOperand(0)->getType()))
+ return false;
if (isa<CallInst>(IL.first) && !isa<IntrinsicInst>(IL.first))
return false;
auto *II = dyn_cast<IntrinsicInst>(IL.first);
@@ -1769,12 +1776,17 @@ bool VectorCombine::foldShuffleToIdentity(Instruction &I) {
// Check the operator is one that we support. We exclude div/rem in case
// they hit UB from poison lanes.
- if (isa<BinaryOperator>(Item[0].first) &&
- !cast<BinaryOperator>(Item[0].first)->isIntDivRem()) {
+ if ((isa<BinaryOperator>(Item[0].first) &&
+ !cast<BinaryOperator>(Item[0].first)->isIntDivRem()) ||
+ isa<CmpInst>(Item[0].first)) {
Worklist.push_back(GenerateInstLaneVectorFromOperand(Item, 0));
Worklist.push_back(GenerateInstLaneVectorFromOperand(Item, 1));
} else if (isa<UnaryOperator>(Item[0].first)) {
Worklist.push_back(GenerateInstLaneVectorFromOperand(Item, 0));
+ } else if (isa<SelectInst>(Item[0].first)) {
+ Worklist.push_back(GenerateInstLaneVectorFromOperand(Item, 0));
+ Worklist.push_back(GenerateInstLaneVectorFromOperand(Item, 1));
+ Worklist.push_back(GenerateInstLaneVectorFromOperand(Item, 2));
} else if (auto *II = dyn_cast<IntrinsicInst>(Item[0].first);
II && isTriviallyVectorizable(II->getIntrinsicID())) {
for (unsigned Op = 0, E = II->getNumOperands() - 1; Op < E; Op++) {
@@ -1834,6 +1846,10 @@ bool VectorCombine::foldShuffleToIdentity(Instruction &I) {
if (auto BI = dyn_cast<BinaryOperator>(I))
return Builder.CreateBinOp((Instruction::BinaryOps)BI->getOpcode(),
Ops[0], Ops[1]);
+ if (auto CI = dyn_cast<CmpInst>(I))
+ return Builder.CreateCmp(CI->getPredicate(), Ops[0], Ops[1]);
+ if (auto SI = dyn_cast<SelectInst>(I))
+ return Builder.CreateSelect(Ops[0], Ops[1], Ops[2], "", SI);
if (II)
return Builder.CreateIntrinsic(DstTy, II->getIntrinsicID(), Ops);
assert(isa<UnaryInstruction>(I) &&
diff --git a/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll b/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
index eb368471b1d84..523c0a476de63 100644
--- a/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
+++ b/llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
@@ -399,19 +399,8 @@ define <8 x i8> @extrause_shuffle(<8 x i8> %a, <8 x i8> %b) {
define <8 x i8> @icmpsel(<8 x i8> %a, <8 x i8> %b, <8 x i8> %c, <8 x i8> %d) {
; CHECK-LABEL: @icmpsel(
-; CHECK-NEXT: [[AB:%.*]] = shufflevector <8 x i8> [[A:%.*]], <8 x i8> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT: [[AT:%.*]] = shufflevector <8 x i8> [[A]], <8 x i8> poison, <4 x i32> <i32 7, i32 6, i32 5, i32 4>
-; CHECK-NEXT: [[BB:%.*]] = shufflevector <8 x i8> [[B:%.*]], <8 x i8> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT: [[BT:%.*]] = shufflevector <8 x i8> [[B]], <8 x i8> poison, <4 x i32> <i32 7, i32 6, i32 5, i32 4>
-; CHECK-NEXT: [[CB:%.*]] = shufflevector <8 x i8> [[C:%.*]], <8 x i8> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT: [[CT:%.*]] = shufflevector <8 x i8> [[C]], <8 x i8> poison, <4 x i32> <i32 7, i32 6, i32 5, i32 4>
-; CHECK-NEXT: [[DB:%.*]] = shufflevector <8 x i8> [[D:%.*]], <8 x i8> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT: [[DT:%.*]] = shufflevector <8 x i8> [[D]], <8 x i8> poison, <4 x i32> <i32 7, i32 6, i32 5, i32 4>
-; CHECK-NEXT: [[ABT1:%.*]] = icmp slt <4 x i8> [[AT]], [[BT]]
-; CHECK-NEXT: [[ABB1:%.*]] = icmp slt <4 x i8> [[AB]], [[BB]]
-; CHECK-NEXT: [[ABT:%.*]] = select <4 x i1> [[ABT1]], <4 x i8> [[CT]], <4 x i8> [[DT]]
-; CHECK-NEXT: [[ABB:%.*]] = select <4 x i1> [[ABB1]], <4 x i8> [[CB]], <4 x i8> [[DB]]
-; CHECK-NEXT: [[R:%.*]] = shufflevector <4 x i8> [[ABT]], <4 x i8> [[ABB]], <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT: [[TMP1:%.*]] = icmp slt <8 x i8> [[A:%.*]], [[B:%.*]]
+; CHECK-NEXT: [[R:%.*]] = select <8 x i1> [[TMP1]], <8 x i8> [[C:%.*]], <8 x i8> [[D:%.*]]
; CHECK-NEXT: ret <8 x i8> [[R]]
;
%ab = shufflevector <8 x i8> %a, <8 x i8> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
@@ -465,19 +454,8 @@ define <8 x i8> @icmpsel_diffentcond(<8 x i8> %a, <8 x i8> %b, <8 x i8> %c, <8 x
define <8 x i8> @fcmpsel(<8 x half> %a, <8 x half> %b, <8 x i8> %c, <8 x i8> %d) {
; CHECK-LABEL: @fcmpsel(
-; CHECK-NEXT: [[AB:%.*]] = shufflevector <8 x half> [[A:%.*]], <8 x half> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT: [[AT:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <4 x i32> <i32 7, i32 6, i32 5, i32 4>
-; CHECK-NEXT: [[BB:%.*]] = shufflevector <8 x half> [[B:%.*]], <8 x half> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT: [[BT:%.*]] = shufflevector <8 x half> [[B]], <8 x half> poison, <4 x i32> <i32 7, i32 6, i32 5, i32 4>
-; CHECK-NEXT: [[CB:%.*]] = shufflevector <8 x i8> [[C:%.*]], <8 x i8> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT: [[CT:%.*]] = shufflevector <8 x i8> [[C]], <8 x i8> poison, <4 x i32> <i32 7, i32 6, i32 5, i32 4>
-; CHECK-NEXT: [[DB:%.*]] = shufflevector <8 x i8> [[D:%.*]], <8 x i8> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
-; CHECK-NEXT: [[DT:%.*]] = shufflevector <8 x i8> [[D]], <8 x i8> poison, <4 x i32> <i32 7, i32 6, i32 5, i32 4>
-; CHECK-NEXT: [[ABT1:%.*]] = fcmp olt <4 x half> [[AT]], [[BT]]
-; CHECK-NEXT: [[ABB1:%.*]] = fcmp olt <4 x half> [[AB]], [[BB]]
-; CHECK-NEXT: [[ABT:%.*]] = select <4 x i1> [[ABT1]], <4 x i8> [[CT]], <4 x i8> [[DT]]
-; CHECK-NEXT: [[ABB:%.*]] = select <4 x i1> [[ABB1]], <4 x i8> [[CB]], <4 x i8> [[DB]]
-; CHECK-NEXT: [[R:%.*]] = shufflevector <4 x i8> [[ABT]], <4 x i8> [[ABB]], <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT: [[TMP1:%.*]] = fcmp olt <8 x half> [[A:%.*]], [[B:%.*]]
+; CHECK-NEXT: [[R:%.*]] = select <8 x i1> [[TMP1]], <8 x i8> [[C:%.*]], <8 x i8> [[D:%.*]]
; CHECK-NEXT: ret <8 x i8> [[R]]
;
%ab = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
|
rebase? |
a8e1d40
to
650a4a2
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
650a4a2
to
db5f5e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple of minors
@@ -1742,6 +1742,10 @@ static Value *generateNewInstTree(ArrayRef<InstLane> Item, FixedVectorType *Ty, | |||
if (auto *BI = dyn_cast<BinaryOperator>(I)) | |||
return Builder.CreateBinOp((Instruction::BinaryOps)BI->getOpcode(), Ops[0], | |||
Ops[1]); | |||
if (auto CI = dyn_cast<CmpInst>(I)) | |||
return Builder.CreateCmp(CI->getPredicate(), Ops[0], Ops[1]); | |||
if (auto SI = dyn_cast<SelectInst>(I)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *SI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, to catch such nits, I would recommend getting the official clangd extension from LLVM in VSCode (if you use VSCode to contribute to LLVM): it will automatically yellow-underline such instances, and prompt you about the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice for the CI code_formatter stage to pick these up
@@ -1742,6 +1742,10 @@ static Value *generateNewInstTree(ArrayRef<InstLane> Item, FixedVectorType *Ty, | |||
if (auto *BI = dyn_cast<BinaryOperator>(I)) | |||
return Builder.CreateBinOp((Instruction::BinaryOps)BI->getOpcode(), Ops[0], | |||
Ops[1]); | |||
if (auto CI = dyn_cast<CmpInst>(I)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *CI
Other than some additional checks needed for compare predicates and selects with scalar condition operands, these are relatively simple additions to what already exists.
db5f5e0
to
d294768
Compare
if ((isa<BinaryOperator>(FrontV) && | ||
!cast<BinaryOperator>(FrontV)->isIntDivRem()) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use dyn_cast
in place of isa
+ cast
.
Other than some additional checks needed for compare predicates and selects with scalar condition operands, these are relatively simple additions to what already exists.
This appears to have injected a regression in codegen in Halide -- before this, an expression of the form
now, it does a much more complex operation:
We will temporarily disable this test on our side, but the apparent regression in codegen looks bad -- we should probably either revert this or plan a fix-forward. |
Hi - Yeah I can take a look. Do you have a reproducer? I don't read X86 very fluently but the hope was that this would only remove unnecessary shuffles. |
Enclosed are before-and-after (BAD and GOOD) IR generated by Halide. |
Thanks, it looks like these, if there is no other optimizations going on: https://godbolt.org/z/MfETGqdse
Bad:
|
Apparently vblendvb with ymm registers is only available with avx2: https://godbolt.org/z/d38hbxMaP
|
The failure in question appears with avx (but not avx2) codegen enabled |
@davemgreen I'll take a look at this - we have a few cases on AVX1 where we don't split 256-bit vectors when we probably should. |
Thanks - I can also add a costmodel, which seems to help too with no avx (although does rely on splat shuffles/concats looking expensive). It was something I might need in the future. I've upstreamed patches for all the parts I had, but there was another motivating example I had of that needed "piecewise splat's", which should be cheap if we can recognize them. I will run some tests to see what happens with a costmodel added. |
llvm/llvm-project#92794 broke generation of pblend*b in some situations. A fix is underway; this just comments out those failures temporarily.
There is a quick cost-model added in #93937. |
…s where we have to split/concat 128-bit subvectors We'd be better off consistently using 128-bit instructions Based off a regression reported after #92794
…h of the operands are free to split. Often on AVX1 we're better off consistently using 128-bit instructions, so recognise when the operands are loads that can be freely/cheaply split - ideally this functionality needs to be moved to isFreeToSplitVector but we're using it in a few places where we don't want to split loads yet. Based off a regression reported after #92794
@steven-johnson Please can you tell me if b52962d addresses your perf regression? |
Testing now.
…On Fri, May 31, 2024 at 6:44 AM Simon Pilgrim ***@***.***> wrote:
@steven-johnson <https://github.com/steven-johnson> Please can you tell
me if b52962d
<b52962d>
addresses your perf regression?
—
Reply to this email directly, view it on GitHub
<#92794 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACQJ65L4M2RMNOT2VWZINLZFB5E5AVCNFSM6AAAAABIADZSGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGE4TMOJTGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
b52962d appears to address it, please let me know when it lands. |
Already committed to trunk. I'll probably try to extend it further as 256-bit integer ops are tricky on AVX1, so please report any other perf issues you see. |
…897600144 Local branch amd-gfx 2958976 Merged main:cbed9a64491d82d6c4a3a7d0cd97cdee32ff2301 into amd-gfx:fcac22e68cfc Remote branch main 516a9f5 [VectorCombine] Add Cmp and Select for shuffleToIdentity (llvm#92794)
This looks to cause an assertion with the following reproducer:
|
Thanks for the reproducer - it looks like the test we have for scalar-select conditions isn't working as it should. |
It is hopefully fixed in 76c8e1d. Let me know if not! Thanks |
Other than some additional checks needed for compare predicates and selects with scalar condition operands, these are relatively simple additions to what already exists.