Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ValueTracking] use KnownBits to compute fpclass from bitcast #97762

Merged
merged 6 commits into from
Aug 30, 2024

Conversation

AlexMaclean
Copy link
Member

When we encounter a bitcast from an integer type we can use the information from KnownBits to glean some information about the fpclass:

  • If the sign bit is known, we can transfer this information over.
  • If the float is IEEE format and enough of the bits are known, we may be able to prove it is NaN or that it can never be NaN.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 4, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Alex MacLean (AlexMaclean)

Changes

When we encounter a bitcast from an integer type we can use the information from KnownBits to glean some information about the fpclass:

  • If the sign bit is known, we can transfer this information over.
  • If the float is IEEE format and enough of the bits are known, we may be able to prove it is NaN or that it can never be NaN.

Full diff: https://github.com/llvm/llvm-project/pull/97762.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+30)
  • (modified) llvm/test/Transforms/Attributor/nofpclass.ll (+104)
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 85abf00774a02..a16c8e3d48403 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -5805,6 +5805,36 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts,
 
     break;
   }
+  case Instruction::BitCast: {
+    const Type *Ty = Op->getType();
+    const Value *Casted = Op->getOperand(0);
+    if (Ty->isVectorTy() || !Casted->getType()->isIntOrIntVectorTy())
+      break;
+
+    KnownBits Bits(Ty->getScalarSizeInBits());
+    computeKnownBits(Casted, Bits, Depth + 1, Q);
+
+    // Transfer information from the sign bit.
+    if (Bits.Zero.isSignBitSet())
+      Known.signBitMustBeZero();
+    else if (Bits.One.isSignBitSet())
+      Known.signBitMustBeOne();
+
+    if (Ty->isIEEE()) {
+      // IEEE floats are NaN when all bits of the exponent plus at least one of
+      // the fraction bits are 1. This means:
+      //   - If we assume unknown bits are 0 and the value is NaN, it will
+      //     always be NaN
+      //   - If we assume unknown bits are 1 and the value is not NaN, it can
+      //     never be NaN
+      if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
+        Known.KnownFPClasses = fcNan;
+      else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
+        Known.knownNot(fcNan);
+    }
+
+    break;
+  }
   default:
     break;
   }
diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll b/llvm/test/Transforms/Attributor/nofpclass.ll
index 781ba636c3ab3..c5d562a436b33 100644
--- a/llvm/test/Transforms/Attributor/nofpclass.ll
+++ b/llvm/test/Transforms/Attributor/nofpclass.ll
@@ -2690,6 +2690,110 @@ entry:
   ret double %abs
 }
 
+define float @bitcast_to_float_sign_0(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @bitcast_to_float_sign_0
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 [[ARG]], 1
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:    ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 1
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nnan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @bitcast_to_float_nnan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 [[ARG]], 2
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:    ret float [[TMP2]]
+;
+  %1 = lshr i32 %arg, 2
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_sign_1(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @bitcast_to_float_sign_1
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[ARG]], -2147483648
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:    ret float [[TMP2]]
+;
+  %1 = or i32 %arg, -2147483648
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define float @bitcast_to_float_nan(i32 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[ARG]], 2139095041
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32 [[TMP1]] to float
+; CHECK-NEXT:    ret float [[TMP2]]
+;
+  %1 = or i32 %arg, 2139095041
+  %2 = bitcast i32 %1 to float
+  ret float %2
+}
+
+define double @bitcast_to_double_sign_0(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double @bitcast_to_double_sign_0
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i64 [[ARG]], 1
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT:    ret double [[TMP2]]
+;
+  %1 = lshr i64 %arg, 1
+  %2 = bitcast i64 %1 to double
+  ret double %2
+}
+
+define double @bitcast_to_double_nnan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan ninf nzero nsub nnorm) double @bitcast_to_double_nnan
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i64 [[ARG]], 2
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT:    ret double [[TMP2]]
+;
+  %1 = lshr i64 %arg, 2
+  %2 = bitcast i64 %1 to double
+  ret double %2
+}
+
+define double @bitcast_to_double_sign_1(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) double @bitcast_to_double_sign_1
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = or i64 [[ARG]], -9223372036854775808
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT:    ret double [[TMP2]]
+;
+  %1 = or i64 %arg, -9223372036854775808
+  %2 = bitcast i64 %1 to double
+  ret double %2
+}
+
+define double @bitcast_to_double_nan(i64 %arg) {
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-LABEL: define nofpclass(inf zero sub norm) double @bitcast_to_double_nan
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = or i64 [[ARG]], -4503599627370495
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i64 [[TMP1]] to double
+; CHECK-NEXT:    ret double [[TMP2]]
+;
+  %1 = or i64 %arg, -4503599627370495
+  %2 = bitcast i64 %1 to double
+  ret double %2
+}
+
 declare i64 @_Z13get_global_idj(i32 noundef)
 
 attributes #0 = { "denormal-fp-math"="preserve-sign,preserve-sign" }

llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
llvm/test/Transforms/Attributor/nofpclass.ll Outdated Show resolved Hide resolved
llvm/lib/Analysis/ValueTracking.cpp Outdated Show resolved Hide resolved
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jul 5, 2024
; GCN-NEXT: v_cndmask_b32_e32 v1, 0, v0, vcc
; GCN-NEXT: v_mul_f32_e64 v0, -v0, v1
; GCN-NEXT: v_cmp_lt_f32_e32 vcc, 0, v0
; GCN-NEXT: s_and_b64 vcc, exec, vcc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like all instructions are removed by DCE. Can you fix this (in a separate PR) test by adding a user with side effect?

define amdgpu_kernel void @fnge_select_f32_multi_use_regression(float %.i2369, ptr addrspace(1) %dst) {
   .entry:
   %i = fcmp uge float %.i2369, 0.000000e+00
  %.i2379 = select i1 %i, i32 1, i32 0
  %.i0436 = bitcast i32 %.i2379 to float
  %.i0440 = fneg float %.i0436
  %i1 = fcmp uge float %.i0436, 0.000000e+00
  %.i2495 = select i1 %i1, i32 %.i2379, i32 0
  %.i0552 = bitcast i32 %.i2495 to float
  %.i0592 = fmul float %.i0440, %.i0552
  %.i0721 = fcmp ogt float %.i0592, 0.000000e+00
  br i1 %.i0721, label %bb5, label %bb
bb:                                               ; preds = %.entry
  %i2 = call <2 x i32> @llvm.amdgcn.s.buffer.load.v2i32(<4 x i32> zeroinitializer, i32 1, i32 0)
  %i3 = shufflevector <2 x i32> %i2, <2 x i32> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
  %i4 = bitcast <4 x i32> %i3 to <4 x float>
  %.i0753 = extractelement <4 x float> %i4, i64 0
  store float %.i0753, ptr addrspace(1) %dst
  br label %bb5
bb5:                                              ; preds = %bb, %.entry
  ret void
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#106268 slightly adjusts this test to ensure it doesn't get DCE'd away after this change.

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jul 6, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 6, 2024

Can you add some tests to demonstrate that this patch will enable more optimizations in some real-world applications?

@AlexMaclean
Copy link
Member Author

Can you add some tests to demonstrate that this patch will enable more optimizations in some real-world applications?

I can extend the existing test cases to make them more elaborate/real-looking, but I'm guessing that would not qualify as "real-world". This patch is motivated by an internal benchmark where there were some cases where this helped, though even that case is in some sense artificial. Is this a necessary criteria for landing this change? I believe we already handle float to int in KnownBits and adding the inverse in KnownFPClass seems like a correct and reasonable extension of the logic, even if there are not many cases where it is used.

@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 7, 2024

This patch is motivated by an internal benchmark where there were some cases where this helped, though even that case is in some sense artificial.

It is ok to optimize some pattern in proxy apps/benchmarks :)

Is this a necessary criteria for landing this change?

As recursively computing known bits/fpclass is expensive, it is not worth adding additional complexity unless it is useful for real-world applications.

I believe we already handle float to int in KnownBits

I posted #86409 because I found it enabled more optimizations in [my ir dataset] (https://github.com/dtcxzyw/llvm-opt-benchmark/pull/451/files).

and adding the inverse in KnownFPClass seems like a correct and reasonable extension of the logic, even if there are not many cases where it is used.

Don't do this. It is a trap of code coverage. We need to strike a balance between optimizations, compilation time and maintainability.

For more details, see https://llvm.org/docs/InstCombineContributorGuide.html#real-world-usefulness.

if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
Known.KnownFPClasses = fcNan;
else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
Known.knownNot(fcNan);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you cant also do inf/normal/subnormal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inf is simpler, but normal / subnormal would require knowing at least one bit of the mantissa is set and the exponent is not the maximum which is trickier

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've added inf and also zero since those are both relatively simple, but I've left normal / subnormal to the side for now.

@arsenm
Copy link
Contributor

arsenm commented Jul 23, 2024

Don't do this. It is a trap of code coverage. We need to strike a balance between optimizations, compilation time and maintainability.

I think just about any of these can appear in math library code. Comprehensively modeling operations can find surprising optimizations you might not have manually noticed

@arsenm arsenm added the floating-point Floating-point math label Jul 23, 2024
if (APFloat(Ty->getFltSemantics(), Bits.One).isNaN())
Known.KnownFPClasses = fcNan;
else if (!APFloat(Ty->getFltSemantics(), ~Bits.Zero).isNaN())
Known.knownNot(fcNan);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inf is simpler, but normal / subnormal would require knowing at least one bit of the mantissa is set and the exponent is not the maximum which is trickier

llvm/test/Transforms/Attributor/nofpclass.ll Show resolved Hide resolved
@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 22, 2024

Reverse ping :)

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines 5956 to 5957
InfKB = InfKB.intersectWith(KnownBits::makeConstant(
APFloat::getInf(Ty->getFltSemantics(), true).bitcastToAPInt()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
InfKB = InfKB.intersectWith(KnownBits::makeConstant(
APFloat::getInf(Ty->getFltSemantics(), true).bitcastToAPInt()));
InfKB.Zero.clearSignBit();

Comment on lines 5969 to 5970
ZeroKB = ZeroKB.intersectWith(KnownBits::makeConstant(
APFloat::getZero(Ty->getFltSemantics(), true).bitcastToAPInt()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ZeroKB = ZeroKB.intersectWith(KnownBits::makeConstant(
APFloat::getZero(Ty->getFltSemantics(), true).bitcastToAPInt()));
ZeroKB.Zero.clearSignBit();

@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 27, 2024

Please update these failed tests:

Failed Tests (2):
LLVM :: CodeGen/AMDGPU/anyext.ll
LLVM :: CodeGen/AMDGPU/fneg-modifier-casting.ll

AlexMaclean added a commit that referenced this pull request Aug 27, 2024
Make some minor tweaks to AMDGPU tests to ensure they still work as
intended after #97762. These
tests can be radically simplified after bitcast aware fpclass deduction.
@AlexMaclean
Copy link
Member Author

@arsenm, @goldsteinn when you have a minute could you take another look at this? I think I've addressed all the issues you've raised.

auto InfKB = KnownBits::makeConstant(
APFloat::getInf(Ty->getFltSemantics()).bitcastToAPInt());
InfKB.Zero.clearSignBit();
if (const auto InfResult = KnownBits::eq(Bits, InfKB)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you need to unset signbit in Bits as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then ofc reset later, that or test against inf and -inf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. KnownBits::eq will return false if the inputs cannot be equal and std::nullopt if the may or may not be equal (in this case it cannot return true because InfKB is not fully known). Clearing the sign bit of Bits won't change the result either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Bits are -inf, won't you get known-false incorrectly? Or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case the values will be:

Bits  = 1 11111111 00000000000000000000000
InfKB = ? 11111111 00000000000000000000000

These may or may not be equal so std::nullopt will be returned and no information will be added to the fpclass. I suppose we could handle this case but it will be constant folded anyway so I don't think it is really necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah are clearing the signbit.

This is a bit odd, I don't think you can ever get KnownBits::eq to ever not be std::nullopt with unknown bits....

Likewise if the signbit is known in Bits, the follow up == will never be true...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can ever get KnownBits::eq to ever not be std::nullopt with unknown bits....

It can never return true, but it certainly return false if any bit is know to be 1 in one value and 0 in the other.

Likewise if the signbit is known in Bits, the follow up == will never be true...

That == case is there to handle the case where the value is not a constant (will be folded) but it is still known that it must be inf, the only way this is possible is if the known bits are exactly ? 11111111 00000000000000000000000.

There are examples of both of these cases in my tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I was missing it will constant fold if signbit is known.

5c4lar pushed a commit to 5c4lar/llvm-project that referenced this pull request Aug 29, 2024
Make some minor tweaks to AMDGPU tests to ensure they still work as
intended after llvm#97762. These
tests can be radically simplified after bitcast aware fpclass deduction.
Copy link
Contributor

@goldsteinn goldsteinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (but needs a rebase)

@AlexMaclean AlexMaclean merged commit 369d814 into llvm:main Aug 30, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang Clang issues not falling into any other category floating-point Floating-point math llvm:analysis llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants