Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InstCombine of is.fpclass pessimizes codegen for comparison with fast math flags #104597

Open
jayfoad opened this issue Aug 16, 2024 · 9 comments
Labels

Comments

@jayfoad
Copy link
Contributor

jayfoad commented Aug 16, 2024

Test case:

define i1 @f(float %x) {
  %y = fmul nnan float%x, 3.0
  %i = bitcast float %y to i32
  %c = icmp ne i32 %i, 0
  ret i1 %c
}

AMD64 codegen:

$ llc -mtriple=amd64 < fpc.ll
...
	mulss	.LCPI0_0(%rip), %xmm0
	movd	%xmm0, %eax
	testl	%eax, %eax
	setne	%al

AMD64 codegen after InstCombine:

$ opt -S -passes=instcombine < fpc.ll | llc -mtriple=amd64
...
	mulss	.LCPI0_0(%rip), %xmm0
	movd	%xmm0, %eax
	movl	%eax, %ecx
	negl	%ecx
	seto	%cl
	andl	$2147483647, %eax               # imm = 0x7FFFFFFF
	cmpl	$2139095040, %eax               # imm = 0x7F800000
	sete	%dl
	orb	%cl, %dl
	leal	-1(%rax), %ecx
	cmpl	$8388607, %ecx                  # imm = 0x7FFFFF
	setb	%cl
	orb	%dl, %cl
	addl	$-8388608, %eax                 # imm = 0xFF800000
	cmpl	$2130706432, %eax               # imm = 0x7F000000
	setb	%al
	orb	%cl, %al

Here's what happens:

  1. InstCombine recognizes the icmp-of-bitcast and turns it into llvm.is.fpclass(%y, 0x3bf)
  2. InstCombine recognizes that %y cannot be nan and turns this into llvm.is.fpclass(%y, 0x3bc) (see
    // Clear test bits we know must be false from the source value.
    ). This seems dubious to me - it could just as well set these bits, or leave them alone. I guess clearing them works as a kind of canonicalization so that we could CSE two calls to llvm.is.fpclass that differ only in the "nan" bits if the argument is known not to be nan. Really it would be nice if we could set these bits to a "don't care" value.
  3. CodeGen does not recognize this as a special case and has to emit a long sequence of comparisons. This could be improved if the "not nan" information is still available at this point.
@jayfoad
Copy link
Contributor Author

jayfoad commented Aug 16, 2024

@arsenm arsenm added the floating-point Floating-point math label Aug 16, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 16, 2024

. This seems dubious to me - it could just as well set these bits, or leave them alone. I guess clearing them works as a kind of canonicalization so that we could CSE two calls to llvm.is.fpclass that differ only in the "nan" bits if the argument is known not to be nan. Really it would be nice if we could set these bits to a "don't care" value.

Can we use nofpclass for these don't care values?

@arsenm
Copy link
Contributor

arsenm commented Aug 16, 2024

I'm not sure it's useful to turn icmp into class. The only plus I see is it would get you some FP value tracking, maybe? But if we look through bitcasts for that, it's the same?

@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 18, 2024

I'm not sure it's useful to turn icmp into class. The only plus I see is it would get you some FP value tracking, maybe? But if we look through bitcasts for that, it's the same?

It is used to improve fp-related context-sensitive analysis. And we lower is.fpclass to FCLASS on RISC-V, which saves 10+ cycles on some corner cases.

@arsenm
Copy link
Contributor

arsenm commented Aug 19, 2024

I'm not sure it's useful to turn icmp into class. The only plus I see is it would get you some FP value tracking, maybe? But if we look through bitcasts for that, it's the same?

It is used to improve fp-related context-sensitive analysis. And we lower is.fpclass to FCLASS on RISC-V, which saves 10+ cycles on some corner cases.

But can't we just recognize the same FP context analysis in terms of icmp? Isn't that what #97762 is doing?

Plus we are definitely missing codegen handling to swap between the two forms

@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 21, 2024

But can't we just recognize the same FP context analysis in terms of icmp?

I choose to turn icmp into fpclass because Clang codegen emits fpclass for isnan/isinf math libcalls: https://godbolt.org/z/GTadKxrjd. We need to choose either icmp or fpclass as the canonical form.

Isn't that what #97762 is doing?

As we have already canonicalized most of icmp patterns into fpclass, this patch is not beneficial for real-world cases.

Plus we are definitely missing codegen handling to swap between the two forms

I suggest using nofpclass to represent these don't case values. Then we transform fpclass into target-preferred sequences in CGP/SDAG.

For the original issue, InstCombine should canonicalize is.fpclass(X, ~zero) into is.fpclass(nofpclass(nan) X, ~(zero|nan)).

@arsenm
Copy link
Contributor

arsenm commented Aug 21, 2024

But can't we just recognize the same FP context analysis in terms of icmp?

I choose to turn icmp into fpclass because Clang codegen emits fpclass for isnan/isinf math libcalls: https://godbolt.org/z/GTadKxrjd. We need to choose either icmp or fpclass as the canonical form.

Clang is not emitting the canonical form. Instcombine turns the isinf/isnan cases into fcmp

Isn't that what #97762 is doing?

As we have already canonicalized most of icmp patterns into fpclass, this patch is not beneficial for real-world cases.

Disagree, these patterns appear in math library code and I don't think most of them have been handled.

Plus we are definitely missing codegen handling to swap between the two forms

I suggest using nofpclass to represent these don't case values. Then we transform fpclass into target-preferred sequences in CGP/SDAG.

For the original issue, InstCombine should canonicalize is.fpclass(X, ~zero) into is.fpclass(nofpclass(nan) X, ~(zero|nan)).

If this case was a comparison to 0, we would use fcmp. Here it's sensitive to the sign of 0 and it only checks +0.

I think the chain of preference should be fcmp -> icmp -> is.fpclass. This case is tricky because of -0 though

@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 21, 2024

Clang is not emitting the canonical form. Instcombine turns the isinf/isnan cases into fcmp

glibc uses macros for these libcalls. And clang codegen emits is.fpclass.

/* Return nonzero value if X is not +-Inf or NaN.  */
# if (__GNUC_PREREQ (4,4) && !defined __SUPPORT_SNAN__) \
     || __glibc_clang_prereq (2,8)
#  define isfinite(x) __builtin_isfinite (x)
# else
#  define isfinite(x) __MATH_TG ((x), __finite, (x))
# endif

/* Return nonzero value if X is neither zero, subnormal, Inf, nor NaN.  */
# if (__GNUC_PREREQ (4,4) && !defined __SUPPORT_SNAN__) \
     || __glibc_clang_prereq (2,8)
#  define isnormal(x) __builtin_isnormal (x)
# else
#  define isnormal(x) (fpclassify (x) == FP_NORMAL)
# endif

/* Return nonzero value if X is a NaN.  We could use `fpclassify' but
   we already have this functions `__isnan' and it is faster.  */
# if (__GNUC_PREREQ (4,4) && !defined __SUPPORT_SNAN__) \
     || __glibc_clang_prereq (2,8)
#  define isnan(x) __builtin_isnan (x)
# else
#  define isnan(x) __MATH_TG ((x), __isnan, (x))
# endif

/* Return nonzero value if X is positive or negative infinity.  */
# if __HAVE_DISTINCT_FLOAT128 && !__GNUC_PREREQ (7,0) \
     && !defined __SUPPORT_SNAN__ && !defined __cplusplus
   /* Since __builtin_isinf_sign is broken for float128 before GCC 7.0,
      use the helper function, __isinff128, with older compilers.  This is
      only provided for C mode, because in C++ mode, GCC has no support
      for __builtin_types_compatible_p (and when in C++ mode, this macro is
      not used anyway, because libstdc++ headers undefine it).  */
#  define isinf(x) \
    (__builtin_types_compatible_p (__typeof (x), _Float128) \
     ? __isinff128 (x) : __builtin_isinf_sign (x))
# elif (__GNUC_PREREQ (4,4) && !defined __SUPPORT_SNAN__) \
       || __glibc_clang_prereq (3,7)
#  define isinf(x) __builtin_isinf_sign (x)
# else
#  define isinf(x) __MATH_TG ((x), __isinf, (x))
# endif

case Builtin::BI__builtin_isnan: {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
Value *V = EmitScalarExpr(E->getArg(0));
if (Value *Result = tryUseTestFPKind(*this, BuiltinID, V))
return RValue::get(Result);
return RValue::get(
Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcNan),
ConvertType(E->getType())));
}
case Builtin::BI__builtin_issignaling: {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
Value *V = EmitScalarExpr(E->getArg(0));
return RValue::get(
Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcSNan),
ConvertType(E->getType())));
}
case Builtin::BI__builtin_isinf: {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
Value *V = EmitScalarExpr(E->getArg(0));
if (Value *Result = tryUseTestFPKind(*this, BuiltinID, V))
return RValue::get(Result);
return RValue::get(
Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcInf),
ConvertType(E->getType())));
}
case Builtin::BIfinite:
case Builtin::BI__finite:
case Builtin::BIfinitef:
case Builtin::BI__finitef:
case Builtin::BIfinitel:
case Builtin::BI__finitel:
case Builtin::BI__builtin_isfinite: {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
Value *V = EmitScalarExpr(E->getArg(0));
if (Value *Result = tryUseTestFPKind(*this, BuiltinID, V))
return RValue::get(Result);
return RValue::get(
Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcFinite),
ConvertType(E->getType())));
}
case Builtin::BI__builtin_isnormal: {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
Value *V = EmitScalarExpr(E->getArg(0));
return RValue::get(
Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcNormal),
ConvertType(E->getType())));
}
case Builtin::BI__builtin_issubnormal: {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
Value *V = EmitScalarExpr(E->getArg(0));
return RValue::get(
Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcSubnormal),
ConvertType(E->getType())));
}
case Builtin::BI__builtin_iszero: {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
Value *V = EmitScalarExpr(E->getArg(0));
return RValue::get(
Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcZero),
ConvertType(E->getType())));
}

@arsenm
Copy link
Contributor

arsenm commented Aug 22, 2024

Clang is not emitting the canonical form. Instcombine turns the isinf/isnan cases into fcmp

glibc uses macros for these libcalls. And clang codegen emits is.fpclass.

Yes, but instcombine will immediately turn it back into fcmp ord/uno or fcmp+fabs with inf. Clang could save instcombine some work by emitting that directly (as it used to)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants