-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ValueTracking] use KnownBits to compute fpclass from bitcast #97762
[ValueTracking] use KnownBits to compute fpclass from bitcast #97762
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-llvm-analysis Author: Alex MacLean (AlexMaclean) ChangesWhen we encounter a bitcast from an integer type we can use the information from
Full diff: https://github.com/llvm/llvm-project/pull/97762.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 85abf00774a02..a16c8e3d48403 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -5805,6 +5805,36 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts,
break;
}
+ case Instruction::BitCast: {
+ const Type *Ty = Op->getType();
+ const Value *Casted = Op->getOperand(0);
+ if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy())
+ break;
+
+ KnownBits Bits(Ty->getScalarSizeInBits());
+ computeKnownBits(Casted, Bits, Depth + 1, Q);
+
+ // Transfer information from the sign bit.
+ if (Bits.Zero.isSignBitSet())
+ Known.signBitMustBeZero();
+ else if (Bits.One.isSignBitSet())
+ Known.signBitMustBeOne();
+
+ if (Ty->isIEEE()) {
+ // IEEE floats are NaN when all bits of the exponent plus at least one of
+ // the fraction bits are 1. This means:
+ // - If we assume unknown bits are 0 and the value is NaN, it will
+ // always be NaN
+ // - If we assume unknown bits are 1 and the value is not NaN, it can
+ // never be NaN
+ if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
+ Known.KnownFPClasses = fcNan;
+ else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
+ Known.knownNot(fcNan);
+ }
+
+ break;
+ }
default:
break;
}
diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll b/llvm/test/Transforms/Attributor/nofpclass.ll
index 781ba636c3ab3..c5d562a436b33 100644
--- a/llvm/test/Transforms/Attributor/nofpclass.ll
+++ b/llvm/test/Transforms/Attributor/nofpclass.ll
@@ -2690,6 +2690,110 @@ entry:
ret double %abs
}
+define float @bitcast_to_float_sign_0(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @bitcast_to_float_sign_0
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[ARG]], 1
+; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT: ret float [[TMP2]]
+;
+ %1 = lshr i32 %arg, 1
+ %2 = bitcast i32 %1 to float
+ ret float %2
+}
+
+define float @bitcast_to_float_nnan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @bitcast_to_float_nnan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[ARG]], 2
+; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT: ret float [[TMP2]]
+;
+ %1 = lshr i32 %arg, 2
+ %2 = bitcast i32 %1 to float
+ ret float %2
+}
+
+define float @bitcast_to_float_sign_1(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @bitcast_to_float_sign_1
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT: [[TMP1:%.*]] = or i32 [[ARG]], -2147483648
+; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT: ret float [[TMP2]]
+;
+ %1 = or i32 %arg, -2147483648
+ %2 = bitcast i32 %1 to float
+ ret float %2
+}
+
+define float @bitcast_to_float_nan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT: [[TMP1:%.*]] = or i32 [[ARG]], 2139095041
+; CHECK-NEXT: [[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT: ret float [[TMP2]]
+;
+ %1 = or i32 %arg, 2139095041
+ %2 = bitcast i32 %1 to float
+ ret float %2
+}
+
+define double @bitcast_to_double_sign_0(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double @bitcast_to_double_sign_0
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT: [[TMP1:%.*]] = lshr i64 [[ARG]], 1
+; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT: ret double [[TMP2]]
+;
+ %1 = lshr i64 %arg, 1
+ %2 = bitcast i64 %1 to double
+ ret double %2
+}
+
+define double @bitcast_to_double_nnan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) double @bitcast_to_double_nnan
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT: [[TMP1:%.*]] = lshr i64 [[ARG]], 2
+; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT: ret double [[TMP2]]
+;
+ %1 = lshr i64 %arg, 2
+ %2 = bitcast i64 %1 to double
+ ret double %2
+}
+
+define double @bitcast_to_double_sign_1(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) double @bitcast_to_double_sign_1
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT: [[TMP1:%.*]] = or i64 [[ARG]], -9223372036854775808
+; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT: ret double [[TMP2]]
+;
+ %1 = or i64 %arg, -9223372036854775808
+ %2 = bitcast i64 %1 to double
+ ret double %2
+}
+
+define double @bitcast_to_double_nan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) double @bitcast_to_double_nan
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT: [[TMP1:%.*]] = or i64 [[ARG]], -4503599627370495
+; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT: ret double [[TMP2]]
+;
+ %1 = or i64 %arg, -4503599627370495
+ %2 = bitcast i64 %1 to double
+ ret double %2
+}
+
declare i64 @_Z13get_global_idj(i32 noundef)
attributes #0 = { "denormal-fp-math"="preserve-sign,preserve-sign" }
|
bc8d246
to
1aa4c5c
Compare
1aa4c5c
to
0cc0f04
Compare
; GCN-NEXT: v_cndmask_b32_e32 v1, 0, v0, vcc | ||
; GCN-NEXT: v_mul_f32_e64 v0, -v0, v1 | ||
; GCN-NEXT: v_cmp_lt_f32_e32 vcc, 0, v0 | ||
; GCN-NEXT: s_and_b64 vcc, exec, vcc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all instructions are removed by DCE. Can you fix this (in a separate PR) test by adding a user with side effect?
define amdgpu_kernel void @fnge_select_f32_multi_use_regression(float %.i2369, ptr addrspace(1) %dst) {
.entry:
%i = fcmp uge float %.i2369, 0.000000e+00
%.i2379 = select i1 %i, i32 1, i32 0
%.i0436 = bitcast i32 %.i2379 to float
%.i0440 = fneg float %.i0436
%i1 = fcmp uge float %.i0436, 0.000000e+00
%.i2495 = select i1 %i1, i32 %.i2379, i32 0
%.i0552 = bitcast i32 %.i2495 to float
%.i0592 = fmul float %.i0440, %.i0552
%.i0721 = fcmp ogt float %.i0592, 0.000000e+00
br i1 %.i0721, label %bb5, label %bb
bb: ; preds = %.entry
%i2 = call <2 x i32> @llvm.amdgcn.s.buffer.load.v2i32(<4 x i32> zeroinitializer, i32 1, i32 0)
%i3 = shufflevector <2 x i32> %i2, <2 x i32> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i4 = bitcast <4 x i32> %i3 to <4 x float>
%.i0753 = extractelement <4 x float> %i4, i64 0
store float %.i0753, ptr addrspace(1) %dst
br label %bb5
bb5: ; preds = %bb, %.entry
ret void
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#106268 slightly adjusts this test to ensure it doesn't get DCE'd away after this change.
Can you add some tests to demonstrate that this patch will enable more optimizations in some real-world applications? |
I can extend the existing test cases to make them more elaborate/real-looking, but I'm guessing that would not qualify as "real-world". This patch is motivated by an internal benchmark where there were some cases where this helped, though even that case is in some sense artificial. Is this a necessary criteria for landing this change? I believe we already handle float to int in KnownBits and adding the inverse in KnownFPClass seems like a correct and reasonable extension of the logic, even if there are not many cases where it is used. |
It is ok to optimize some pattern in proxy apps/benchmarks :)
As recursively computing known bits/fpclass is expensive, it is not worth adding additional complexity unless it is useful for real-world applications.
I posted #86409 because I found it enabled more optimizations in [my ir dataset] (https://github.com/dtcxzyw/llvm-opt-benchmark/pull/451/files).
Don't do this. It is a trap of code coverage. We need to strike a balance between optimizations, compilation time and maintainability. For more details, see https://llvm.org/docs/InstCombineContributorGuide.html#real-world-usefulness. |
if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) | ||
Known.KnownFPClasses = fcNan; | ||
else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) | ||
Known.knownNot(fcNan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you cant also do inf
/normal
/subnormal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inf is simpler, but normal / subnormal would require knowing at least one bit of the mantissa is set and the exponent is not the maximum which is trickier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've added inf
and also zero
since those are both relatively simple, but I've left normal / subnormal to the side for now.
I think just about any of these can appear in math library code. Comprehensively modeling operations can find surprising optimizations you might not have manually noticed |
if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN()) | ||
Known.KnownFPClasses = fcNan; | ||
else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN()) | ||
Known.knownNot(fcNan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inf is simpler, but normal / subnormal would require knowing at least one bit of the mantissa is set and the exponent is not the maximum which is trickier
Reverse ping :) |
0cc0f04
to
927f270
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
llvm/lib/Analysis/ValueTracking.cpp
Outdated
InfKB = InfKB.intersectWith(KnownBits::makeConstant( | ||
APFloat::getInf(Ty->getFltSemantics(), true).bitcastToAPInt())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InfKB = InfKB.intersectWith(KnownBits::makeConstant( | |
APFloat::getInf(Ty->getFltSemantics(), true).bitcastToAPInt())); | |
InfKB.Zero.clearSignBit(); |
llvm/lib/Analysis/ValueTracking.cpp
Outdated
ZeroKB = ZeroKB.intersectWith(KnownBits::makeConstant( | ||
APFloat::getZero(Ty->getFltSemantics(), true).bitcastToAPInt())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZeroKB = ZeroKB.intersectWith(KnownBits::makeConstant( | |
APFloat::getZero(Ty->getFltSemantics(), true).bitcastToAPInt())); | |
ZeroKB.Zero.clearSignBit(); |
Please update these failed tests:
|
927f270
to
2de117f
Compare
Make some minor tweaks to AMDGPU tests to ensure they still work as intended after #97762. These tests can be radically simplified after bitcast aware fpclass deduction.
2de117f
to
fd7a02e
Compare
@arsenm, @goldsteinn when you have a minute could you take another look at this? I think I've addressed all the issues you've raised. |
auto InfKB = KnownBits::makeConstant( | ||
APFloat::getInf(Ty->getFltSemantics()).bitcastToAPInt()); | ||
InfKB.Zero.clearSignBit(); | ||
if (const auto InfResult = KnownBits::eq(Bits, InfKB)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think you need to unset signbit in Bits
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then ofc reset later, that or test against inf and -inf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. KnownBits::eq
will return false
if the inputs cannot be equal and std::nullopt
if the may or may not be equal (in this case it cannot return true
because InfKB
is not fully known). Clearing the sign bit of Bits
won't change the result either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Bits
are -inf, won't you get known-false incorrectly? Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the case the values will be:
Bits = 1 11111111 00000000000000000000000
InfKB = ? 11111111 00000000000000000000000
These may or may not be equal so std::nullopt
will be returned and no information will be added to the fpclass. I suppose we could handle this case but it will be constant folded anyway so I don't think it is really necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah are clearing the signbit.
This is a bit odd, I don't think you can ever get KnownBits::eq
to ever not be std::nullopt
with unknown bits....
Likewise if the signbit is known in Bits
, the follow up ==
will never be true...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can ever get KnownBits::eq to ever not be std::nullopt with unknown bits....
It can never return true, but it certainly return false if any bit is know to be 1 in one value and 0 in the other.
Likewise if the signbit is known in Bits, the follow up == will never be true...
That ==
case is there to handle the case where the value is not a constant (will be folded) but it is still known that it must be inf, the only way this is possible is if the known bits are exactly ? 11111111 00000000000000000000000
.
There are examples of both of these cases in my tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I was missing it will constant fold if signbit is known.
Make some minor tweaks to AMDGPU tests to ensure they still work as intended after llvm#97762. These tests can be radically simplified after bitcast aware fpclass deduction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (but needs a rebase)
Co-authored-by: Yingwei Zheng <[email protected]>
Co-authored-by: Yingwei Zheng <[email protected]>
fd7a02e
to
a3fb277
Compare
When we encounter a bitcast from an integer type we can use the information from
KnownBits
to glean some information about the fpclass: