InstCombine of is.fpclass pessimizes codegen for comparison with fast math flags #104597

jayfoad · 2024-08-16T14:36:14Z

Test case:

define i1 @f(float %x) {
  %y = fmul nnan float%x, 3.0
  %i = bitcast float %y to i32
  %c = icmp ne i32 %i, 0
  ret i1 %c
}

AMD64 codegen:

$ llc -mtriple=amd64 < fpc.ll
...
	mulss	.LCPI0_0(%rip), %xmm0
	movd	%xmm0, %eax
	testl	%eax, %eax
	setne	%al

AMD64 codegen after InstCombine:

$ opt -S -passes=instcombine < fpc.ll | llc -mtriple=amd64
...
	mulss	.LCPI0_0(%rip), %xmm0
	movd	%xmm0, %eax
	movl	%eax, %ecx
	negl	%ecx
	seto	%cl
	andl	$2147483647, %eax               # imm = 0x7FFFFFFF
	cmpl	$2139095040, %eax               # imm = 0x7F800000
	sete	%dl
	orb	%cl, %dl
	leal	-1(%rax), %ecx
	cmpl	$8388607, %ecx                  # imm = 0x7FFFFF
	setb	%cl
	orb	%dl, %cl
	addl	$-8388608, %eax                 # imm = 0xFF800000
	cmpl	$2130706432, %eax               # imm = 0x7F000000
	setb	%al
	orb	%cl, %al

Here's what happens:

InstCombine recognizes the icmp-of-bitcast and turns it into llvm.is.fpclass(%y, 0x3bf)
InstCombine recognizes that %y cannot be nan and turns this into llvm.is.fpclass(%y, 0x3bc) (see

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Line 1036 in 14d57e2

// Clear test bits we know must be false from the source value.

). This seems dubious to me - it could just as well set these bits, or leave them alone. I guess clearing them works as a kind of canonicalization so that we could CSE two calls to llvm.is.fpclass that differ only in the "nan" bits if the argument is known not to be nan. Really it would be nice if we could set these bits to a "don't care" value.
CodeGen does not recognize this as a special case and has to emit a long sequence of comparisons. This could be improved if the "not nan" information is still available at this point.

The text was updated successfully, but these errors were encountered:

jayfoad · 2024-08-16T14:36:26Z

@arsenm

dtcxzyw · 2024-08-16T14:44:33Z

. This seems dubious to me - it could just as well set these bits, or leave them alone. I guess clearing them works as a kind of canonicalization so that we could CSE two calls to llvm.is.fpclass that differ only in the "nan" bits if the argument is known not to be nan. Really it would be nice if we could set these bits to a "don't care" value.

Can we use nofpclass for these don't care values?

arsenm · 2024-08-16T16:59:48Z

I'm not sure it's useful to turn icmp into class. The only plus I see is it would get you some FP value tracking, maybe? But if we look through bitcasts for that, it's the same?

dtcxzyw · 2024-08-18T11:23:05Z

I'm not sure it's useful to turn icmp into class. The only plus I see is it would get you some FP value tracking, maybe? But if we look through bitcasts for that, it's the same?

It is used to improve fp-related context-sensitive analysis. And we lower is.fpclass to FCLASS on RISC-V, which saves 10+ cycles on some corner cases.

arsenm · 2024-08-19T14:11:43Z

I'm not sure it's useful to turn icmp into class. The only plus I see is it would get you some FP value tracking, maybe? But if we look through bitcasts for that, it's the same?

It is used to improve fp-related context-sensitive analysis. And we lower is.fpclass to FCLASS on RISC-V, which saves 10+ cycles on some corner cases.

But can't we just recognize the same FP context analysis in terms of icmp? Isn't that what #97762 is doing?

Plus we are definitely missing codegen handling to swap between the two forms

dtcxzyw · 2024-08-21T13:56:57Z

But can't we just recognize the same FP context analysis in terms of icmp?

I choose to turn icmp into fpclass because Clang codegen emits fpclass for isnan/isinf math libcalls: https://godbolt.org/z/GTadKxrjd. We need to choose either icmp or fpclass as the canonical form.

Isn't that what #97762 is doing?

As we have already canonicalized most of icmp patterns into fpclass, this patch is not beneficial for real-world cases.

Plus we are definitely missing codegen handling to swap between the two forms

I suggest using nofpclass to represent these don't case values. Then we transform fpclass into target-preferred sequences in CGP/SDAG.

For the original issue, InstCombine should canonicalize is.fpclass(X, ~zero) into is.fpclass(nofpclass(nan) X, ~(zero|nan)).

arsenm · 2024-08-21T18:27:42Z

But can't we just recognize the same FP context analysis in terms of icmp?

I choose to turn icmp into fpclass because Clang codegen emits fpclass for isnan/isinf math libcalls: https://godbolt.org/z/GTadKxrjd. We need to choose either icmp or fpclass as the canonical form.

Clang is not emitting the canonical form. Instcombine turns the isinf/isnan cases into fcmp

Isn't that what #97762 is doing?

As we have already canonicalized most of icmp patterns into fpclass, this patch is not beneficial for real-world cases.

Disagree, these patterns appear in math library code and I don't think most of them have been handled.

Plus we are definitely missing codegen handling to swap between the two forms

I suggest using nofpclass to represent these don't case values. Then we transform fpclass into target-preferred sequences in CGP/SDAG.

For the original issue, InstCombine should canonicalize is.fpclass(X, ~zero) into is.fpclass(nofpclass(nan) X, ~(zero|nan)).

If this case was a comparison to 0, we would use fcmp. Here it's sensitive to the sign of 0 and it only checks +0.

I think the chain of preference should be fcmp -> icmp -> is.fpclass. This case is tricky because of -0 though

dtcxzyw · 2024-08-21T18:52:08Z

Clang is not emitting the canonical form. Instcombine turns the isinf/isnan cases into fcmp

glibc uses macros for these libcalls. And clang codegen emits is.fpclass.

/* Return nonzero value if X is not +-Inf or NaN.  */
# if (__GNUC_PREREQ (4,4) && !defined __SUPPORT_SNAN__) \
     || __glibc_clang_prereq (2,8)
#  define isfinite(x) __builtin_isfinite (x)
# else
#  define isfinite(x) __MATH_TG ((x), __finite, (x))
# endif

/* Return nonzero value if X is neither zero, subnormal, Inf, nor NaN.  */
# if (__GNUC_PREREQ (4,4) && !defined __SUPPORT_SNAN__) \
     || __glibc_clang_prereq (2,8)
#  define isnormal(x) __builtin_isnormal (x)
# else
#  define isnormal(x) (fpclassify (x) == FP_NORMAL)
# endif

/* Return nonzero value if X is a NaN.  We could use `fpclassify' but
   we already have this functions `__isnan' and it is faster.  */
# if (__GNUC_PREREQ (4,4) && !defined __SUPPORT_SNAN__) \
     || __glibc_clang_prereq (2,8)
#  define isnan(x) __builtin_isnan (x)
# else
#  define isnan(x) __MATH_TG ((x), __isnan, (x))
# endif

/* Return nonzero value if X is positive or negative infinity.  */
# if __HAVE_DISTINCT_FLOAT128 && !__GNUC_PREREQ (7,0) \
     && !defined __SUPPORT_SNAN__ && !defined __cplusplus
   /* Since __builtin_isinf_sign is broken for float128 before GCC 7.0,
      use the helper function, __isinff128, with older compilers.  This is
      only provided for C mode, because in C++ mode, GCC has no support
      for __builtin_types_compatible_p (and when in C++ mode, this macro is
      not used anyway, because libstdc++ headers undefine it).  */
#  define isinf(x) \
    (__builtin_types_compatible_p (__typeof (x), _Float128) \
     ? __isinff128 (x) : __builtin_isinf_sign (x))
# elif (__GNUC_PREREQ (4,4) && !defined __SUPPORT_SNAN__) \
       || __glibc_clang_prereq (3,7)
#  define isinf(x) __builtin_isinf_sign (x)
# else
#  define isinf(x) __MATH_TG ((x), __isinf, (x))
# endif

llvm-project/clang/lib/CodeGen/CGBuiltin.cpp

Lines 3687 to 3753 in 89c556c

    
           case Builtin::BI__builtin_isnan: { 
        
             CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E); 
        
             Value *V = EmitScalarExpr(E->getArg(0)); 
        
             if (Value *Result = tryUseTestFPKind(*this, BuiltinID, V)) 
        
               return RValue::get(Result); 
        
             return RValue::get( 
        
                 Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcNan), 
        
                                    ConvertType(E->getType()))); 
        
           } 
        
           case Builtin::BI__builtin_issignaling: { 
        
             CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E); 
        
             Value *V = EmitScalarExpr(E->getArg(0)); 
        
             return RValue::get( 
        
                 Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcSNan), 
        
                                    ConvertType(E->getType()))); 
        
           } 
        
           case Builtin::BI__builtin_isinf: { 
        
             CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E); 
        
             Value *V = EmitScalarExpr(E->getArg(0)); 
        
             if (Value *Result = tryUseTestFPKind(*this, BuiltinID, V)) 
        
               return RValue::get(Result); 
        
             return RValue::get( 
        
                 Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcInf), 
        
                                    ConvertType(E->getType()))); 
        
           } 
        
           case Builtin::BIfinite: 
        
           case Builtin::BI__finite: 
        
           case Builtin::BIfinitef: 
        
           case Builtin::BI__finitef: 
        
           case Builtin::BIfinitel: 
        
           case Builtin::BI__finitel: 
        
           case Builtin::BI__builtin_isfinite: { 
        
             CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E); 
        
             Value *V = EmitScalarExpr(E->getArg(0)); 
        
             if (Value *Result = tryUseTestFPKind(*this, BuiltinID, V)) 
        
               return RValue::get(Result); 
        
             return RValue::get( 
        
                 Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcFinite), 
        
                                    ConvertType(E->getType()))); 
        
           } 
        
           case Builtin::BI__builtin_isnormal: { 
        
             CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E); 
        
             Value *V = EmitScalarExpr(E->getArg(0)); 
        
             return RValue::get( 
        
                 Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcNormal), 
        
                                    ConvertType(E->getType()))); 
        
           } 
        
           case Builtin::BI__builtin_issubnormal: { 
        
             CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E); 
        
             Value *V = EmitScalarExpr(E->getArg(0)); 
        
             return RValue::get( 
        
                 Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcSubnormal), 
        
                                    ConvertType(E->getType()))); 
        
           } 
        
           case Builtin::BI__builtin_iszero: { 
        
             CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E); 
        
             Value *V = EmitScalarExpr(E->getArg(0)); 
        
             return RValue::get( 
        
                 Builder.CreateZExt(Builder.createIsFPClass(V, FPClassTest::fcZero), 
        
                                    ConvertType(E->getType()))); 
        
           }

arsenm · 2024-08-22T18:08:19Z

Clang is not emitting the canonical form. Instcombine turns the isinf/isnan cases into fcmp

glibc uses macros for these libcalls. And clang codegen emits is.fpclass.

Yes, but instcombine will immediately turn it back into fcmp ord/uno or fcmp+fabs with inf. Clang could save instcombine some work by emitting that directly (as it used to)

github-actions bot added the new issue label Aug 16, 2024

EugeneZelenko added llvm:instcombine and removed new issue labels Aug 16, 2024

arsenm added the floating-point Floating-point math label Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InstCombine of is.fpclass pessimizes codegen for comparison with fast math flags #104597

InstCombine of is.fpclass pessimizes codegen for comparison with fast math flags #104597

jayfoad commented Aug 16, 2024

jayfoad commented Aug 16, 2024

dtcxzyw commented Aug 16, 2024

arsenm commented Aug 16, 2024

dtcxzyw commented Aug 18, 2024

arsenm commented Aug 19, 2024 •

edited

Loading

dtcxzyw commented Aug 21, 2024

arsenm commented Aug 21, 2024

dtcxzyw commented Aug 21, 2024

arsenm commented Aug 22, 2024

InstCombine of is.fpclass pessimizes codegen for comparison with fast math flags #104597

InstCombine of is.fpclass pessimizes codegen for comparison with fast math flags #104597

Comments

jayfoad commented Aug 16, 2024

jayfoad commented Aug 16, 2024

dtcxzyw commented Aug 16, 2024

arsenm commented Aug 16, 2024

dtcxzyw commented Aug 18, 2024

arsenm commented Aug 19, 2024 • edited Loading

dtcxzyw commented Aug 21, 2024

arsenm commented Aug 21, 2024

dtcxzyw commented Aug 21, 2024

arsenm commented Aug 22, 2024

arsenm commented Aug 19, 2024 •

edited

Loading