-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DAG] Add legalization handling for ABDS/ABDU #92576
Conversation
You can test this locally with the following command:git-clang-format --diff 37e75cdf9f432940cfbdcab3a3d8d93eba15bca4 ea442ed3ce6720b07ae973001c499fc37eb9c212 --extensions h,cpp -- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp View the diff from clang-format here.diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index fefb0844f1..0b9ca13c71 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -2796,7 +2796,9 @@ void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
case ISD::Constant: ExpandIntRes_Constant(N, Lo, Hi); break;
case ISD::ABS: ExpandIntRes_ABS(N, Lo, Hi); break;
case ISD::ABDS:
- case ISD::ABDU: ExpandIntRes_ABD(N, Lo, Hi); break;
+ case ISD::ABDU:
+ ExpandIntRes_ABD(N, Lo, Hi);
+ break;
case ISD::CTLZ_ZERO_UNDEF:
case ISD::CTLZ: ExpandIntRes_CTLZ(N, Lo, Hi); break;
case ISD::CTPOP: ExpandIntRes_CTPOP(N, Lo, Hi); break;
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 3a49a8ff10..aaaf4cfb7e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -448,7 +448,7 @@ private:
void ExpandIntRes_AssertZext (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_Constant (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_ABS (SDNode *N, SDValue &Lo, SDValue &Hi);
- void ExpandIntRes_ABD (SDNode *N, SDValue &Lo, SDValue &Hi);
+ void ExpandIntRes_ABD(SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_CTLZ (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_CTPOP (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_CTTZ (SDNode *N, SDValue &Lo, SDValue &Hi);
|
No need for this to be vector specific, and its more likely that scalar cases will appear after #92576
a8f211a
to
7ff766e
Compare
…mparison result cases If the comparison results are allbits masks, we can expand as "abd(lhs, rhs) -> sub(cmpgt(lhs, rhs), xor(sub(lhs, rhs), cmpgt(lhs, rhs)))", replacing a sub+sub+select pattern with the simpler sub+xor+sub pattern. This allows us to remove a lot of X86 specific legalization code, and will be useful in future generic expansion for the legalization work in llvm#92576 Alive2: https://alive2.llvm.org/ce/z/sj863C
…mparison result cases If the comparison results are allbits masks, we can expand as "abd(lhs, rhs) -> sub(cmpgt(lhs, rhs), xor(sub(lhs, rhs), cmpgt(lhs, rhs)))", replacing a sub+sub+select pattern with the simpler sub+xor+sub pattern. This allows us to remove a lot of X86 specific legalization code, and will be useful in future generic expansion for the legalization work in llvm#92576 Alive2: https://alive2.llvm.org/ce/z/sj863C
…mparison result cases If the comparison results are allbits masks, we can expand as "abd(lhs, rhs) -> sub(cmpgt(lhs, rhs), xor(sub(lhs, rhs), cmpgt(lhs, rhs)))", replacing a sub+sub+select pattern with the simpler sub+xor+sub pattern. This allows us to remove a lot of X86 specific legalization code, and will be useful in future generic expansion for the legalization work in llvm#92576 Alive2: https://alive2.llvm.org/ce/z/sj863C
…mparison result cases If the comparison results are allbits masks, we can expand as "abd(lhs, rhs) -> sub(cmpgt(lhs, rhs), xor(sub(lhs, rhs), cmpgt(lhs, rhs)))", replacing a sub+sub+select pattern with the simpler sub+xor+sub pattern. This allows us to remove a lot of X86 specific legalization code, and will be useful in future generic expansion for the legalization work in llvm#92576 Alive2: https://alive2.llvm.org/ce/z/sj863C
…mparison result cases (#92780) If the comparison results are allbits masks, we can expand as `abd(lhs, rhs) -> sub(cmpgt(lhs, rhs), xor(sub(lhs, rhs), cmpgt(lhs, rhs)))`, replacing a sub+sub+select pattern with the simpler sub+xor+sub pattern. This allows us to remove a lot of X86 specific legalization code, and will be useful in future generic expansion for the legalization work in #92576 Alive2: https://alive2.llvm.org/ce/z/sj863C
06e295a
to
707feb8
Compare
Improve test checks for better codegen review of #92576
d6b7540
to
ba56c87
Compare
b7b3e2d
to
074aac2
Compare
This cause clang to crash:
Repro command:
|
Thanks @ZequanWu I have the repro and am reducing it now |
…100810 Extend test coverage for llvm#92576 - copied from existing x86 tests
This fixes a problem if abd nodes are generated more readily (llvm#92576). The folding of abd nodes into abdl needs to check that the inputs are the correct form of vector. The added test requires vector legalization to occur in order to hit the combine at the wrong time.
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv
@RKSimon this patch enters an infinite loop while compiling a Here's 3 stack samples at 3 different moments: T1:
T2:
T3:
The compilation never finishes (waited more than 15min). I see you're also investigating a crash. Can you please revert while we work for a reproducer? |
OK - I'll revert shortly - please can you post any more repros that you can. |
I started a reduction. I'll post the reproducer soonish. |
@RKSimon I noticed an assertion failure when building the Linux kernel with enum {
false,
IQK_S1_RX_X,
IQK_S1_RX_Y,
IQK_S0_TX_X,
IQK_S0_TX_Y,
IQK_S0_RX_X,
IQK_NR
} iqkxy_to_s32___trans_tmp_2;
int __rtw8723x_iqk_similarity_cmp_i, __rtw8723x_iqk_similarity_cmp_diff,
__rtw8723x_iqk_similarity_cmp___trans_tmp_2,
__rtw8723x_iqk_similarity_cmp_candidate_0,
__rtw8723x_iqk_similarity_cmp_tmp1, __rtw8723x_iqk_similarity_cmp_tmp2,
__rtw8723x_iqk_similarity_cmp_val;
_Bool __rtw8723x_iqk_similarity_cmp(int result[][IQK_NR]) {
int bitmap = 0;
for (__rtw8723x_iqk_similarity_cmp_i = 0;
__rtw8723x_iqk_similarity_cmp_i < IQK_NR;
__rtw8723x_iqk_similarity_cmp_i++) {
int index = 9;
char shift = index;
iqkxy_to_s32___trans_tmp_2 =
result[1][__rtw8723x_iqk_similarity_cmp_i] << shift >> shift;
__rtw8723x_iqk_similarity_cmp_tmp1 = iqkxy_to_s32___trans_tmp_2;
__rtw8723x_iqk_similarity_cmp_val =
result[2][__rtw8723x_iqk_similarity_cmp_i];
iqkxy_to_s32___trans_tmp_2 =
__rtw8723x_iqk_similarity_cmp_val << shift >> shift;
__rtw8723x_iqk_similarity_cmp___trans_tmp_2 =
__rtw8723x_iqk_similarity_cmp_tmp2 = iqkxy_to_s32___trans_tmp_2;
__rtw8723x_iqk_similarity_cmp_diff = __builtin_choose_expr(
__builtin_types_compatible_p(typeof(__rtw8723x_iqk_similarity_cmp_tmp2),
long),
0,
__builtin_choose_expr(
__builtin_types_compatible_p(
typeof(__rtw8723x_iqk_similarity_cmp_tmp2), long),
0,
__builtin_choose_expr(
__builtin_types_compatible_p(
typeof(__rtw8723x_iqk_similarity_cmp_tmp2), int),
({
int __x = __rtw8723x_iqk_similarity_cmp_tmp1 -
__rtw8723x_iqk_similarity_cmp_tmp2;
__x < 0 ? -__x : __x;
}),
__builtin_choose_expr(
__builtin_types_compatible_p(
typeof(__rtw8723x_iqk_similarity_cmp_tmp2), short),
0,
__builtin_choose_expr(
__builtin_types_compatible_p(
typeof(__rtw8723x_iqk_similarity_cmp_tmp2), char),
0,
__builtin_choose_expr(
__builtin_types_compatible_p(
typeof(__rtw8723x_iqk_similarity_cmp_tmp2),
char),
0, 0))))));
if (__rtw8723x_iqk_similarity_cmp_diff)
continue;
if ((__rtw8723x_iqk_similarity_cmp_i == IQK_S1_RX_X ||
__rtw8723x_iqk_similarity_cmp_i == IQK_S0_RX_X) &&
bitmap)
if (result[1][1])
__rtw8723x_iqk_similarity_cmp_candidate_0 = 2;
bitmap |= __rtw8723x_iqk_similarity_cmp_i;
}
return false;
}
|
Thanks @nathanchance - I've confirmed thats addressed by my fix for the report from @ZequanWu - I'm going to push the fixed patch shortly. |
Ensure abds doesn't get truncated after type legalisation
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
@RKSimon my reduction is rather slow. I can post a reproducer but it is rather large (about 171K). Or wait for a bit longer? |
@RKSimon here's the repro for completion sake:
extern "C" int abs(int);
struct a {
a();
int m_fn1() { return b; }
int b : 4;
};
int c;
void cg() {
a ci, cj;
int ck = ci.m_fn1(), cl = cj.m_fn1();
if (abs(ck - cl) <= c)
a();
} compilation command:
|
For info: We have seen some problems with test/CodeGen/AArch64/abds.ll downstream after this patch was reapplied. Looking at debug/print-after-all traces for @abd_ext_i128 I see that the ISel may create nodes in different order:
vs
And then it looks like the ISel scheduler may pick nodes in different order? And at least we get slightly different VReg usage in the MIR output after ISel. Later, at MachineCSE there is a limit on how far back it looks for common subexpressions, and we might end up either doing a CSE or not, depending on what the MIR looks like. My current best thinking is that it is a difference depending on if I compile LLVM with clang or gcc. And then it probably is things like this in TargetLowering.cpp
and/or things like this in DagCombiner.cpp:
that makes a difference since the order in which arguments are evaluated isn't well-defined. |
@bjope If you can confirm these go away if we pull out the inner getNode() calls then we can make the change - but given how prevalent this is in the codebase, I don't know what we'd do to address it completely. |
I'm analysing it further, and will prepare something for this specific case (if my current suspicion ends up being correct). Had been nice with some kind of checked (clang-tidy) that would detect the "bad" coding pattern. But I don't know really if that is easy/feasible to implement. |
Turns out those diffs related to argument evaluation order was irrelevant for the failed test. |
…100810 Extend test coverage for llvm#92576 - copied from existing x86 tests
This fixes a problem if abd nodes are generated more readily (llvm#92576). The folding of abd nodes into abdl needs to check that the inputs are the correct form of vector. The added test requires vector legalization to occur in order to hit the combine at the wrong time.
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv
…92576)" Reverting llvm#92576 while we identify a reported regression
Ensure abds doesn't get truncated after type legalisation
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.
abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflowlhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv