Compile time evaluation doesn't match runtime evaluation with -ffp-contract=on #98197

bfraboni · 2024-07-09T17:55:18Z

Hey LLVM team,

I found out a while ago that when -ffp-contract=on the compile time evaluation mismatch the runtime one for some expressions, but I identified that too late to get any traction. Next are some godbolt examples for repro:

results mismatch with -O1 -ffp-contract=on: https://godbolt.org/z/3qTda8KP9
results match with -O1 -ffp-contract=on -mfma: https://godbolt.org/z/vjjoP35W8
results match with -O1 -ffp-contract=off: https://godbolt.org/z/6q1vfbWbT
results match with -O1 -ffp-contract=fast: https://godbolt.org/z/e7Ms55xv1

I think the expected result should be that these two evaluations match all the time, and it seems that the compile time evaluation always use fma even when it is not forced with -mfma. I still don't get what's the exact difference between ffp-contract and mfma but it might something to look into.

Thank you 🙏

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-07-11T18:09:47Z

@llvm/issue-subscribers-clang-frontend

Author: Basile Fraboni (bfraboni)

Hey LLVM team,

I found out a while ago that when -ffp-contract=on the compile time evaluation mismatch the runtime one for some expressions, but I identified that too late to get any traction. Next are some godbolt examples for repro:

results mismatch with -O1 -ffp-contract=on: https://godbolt.org/z/3qTda8KP9
results match with -O1 -ffp-contract=on -mfma: https://godbolt.org/z/vjjoP35W8
results match with -O1 -ffp-contract=off: https://godbolt.org/z/6q1vfbWbT
results match with -O1 -ffp-contract=fast: https://godbolt.org/z/e7Ms55xv1

I think the expected result should be that these two evaluations match all the time, and it seems that the compile time evaluation always use fma even when it is not forced with -mfma. I still don't get what's the exact difference between ffp-contract and mfma but it might something to look into.

Thank you 🙏

arsenm · 2024-07-11T18:13:12Z

seems that the compile time evaluation always use fma even when it is not forced with -mfma.

Correct.

The runtime lowering is target dependent so -mfma happens to make x86 use a native fma instruction

AaronBallman · 2024-07-11T18:13:40Z

Note, in C++, it is a recommended practice but not a normative requirement for floating-point operations to have the same behavior at compile time and runtime: https://eel.is/c++draft/expr.const#15

However, in C, it's a semantic requirement (see C23 6.6p17: "The semantic rules for the evaluation of a constant expression are the same as for nonconstant expressions.") C23 has constexpr objects and globals in C have always had to be initialized with an arithmetic constant expression, so this matters for both languages.

So I think this is somewhere between "bug" and "feature request", but I think Clang should aim to implement the recommended practice whenever possible (though I think C's requirement is overreaching). However, there are a lot of floating-point flags that change the behavior and I'm not certain we're equipped to handle the combinatorial explosion that comes from trying to support them all in constant expressions.

CC @jcranmer-intel @hubert-reinterpretcast @tbaederr @zahiraam

jcranmer-intel · 2024-07-11T18:51:23Z

However, in C, it's a semantic requirement (see C23 6.6p17: "The semantic rules for the evaluation of a constant expression are the same as for nonconstant expressions.") C23 has constexpr objects and globals in C have always had to be initialized with an arithmetic constant expression, so this matters for both languages.

6.5p8: "Otherwise, whether or how expressions are contracted is implementation-defined." Arguably, we can implementation-define that we don't contract in constant expressions. This is a bit of a malicious reading of the standard, but it is perhaps enough cover to say that our failure to consistently execute the same at compile-time versus runtime.

So I think this is somewhere between "bug" and "feature request", but I think Clang should aim to implement the recommended practice whenever possible (though I think C's requirement is overreaching). However, there are a lot of floating-point flags that change the behavior and I'm not certain we're equipped to handle the combinatorial explosion that comes from trying to support them all in constant expressions.

Beyond the combinatorial issue with all the different floating point modes, there's also the fun fact that contract can do things other than FMA, a fact many people tend to miss. I can virtually guarantee that people in the backend adding new combines that take advantage of contraction will neglect to implement the same combines in the frontend, and to be frank, the contractions in practice will to some degree rely on serendipity as to whether or not optimizations will nudge the instructions to the right position or not. (Although C also limits contraction to within single expressions, which is not how it's implemented in the backend!)

bfraboni · 2024-07-11T19:43:55Z

Note, in C++, it is a recommended practice but not a normative requirement for floating-point operations to have the same behavior at compile time and runtime

@AaronBallman does that mean in C++ I can't trust any constexpr float operation to output the same result as its runtime counterpart ? That sounds very wrong..

The runtime lowering is target dependent so -mfma happens to make x86 use a native fma instruction

@arsenm so the flag -ffp-contract=on does not actually enables fma instruction, but fma expression reordering only ? if we want the specific instruction it needs to be enforced with -mfma ? The manual is not super clear about that because it states that this flag enables the use of fma (but does not mention instruction specific or just expression fma ...): https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffp-contract

jcranmer-intel · 2024-07-11T20:18:49Z

@arsenm so the flag -ffp-contract=on does not actually enables fma instruction, but fma expression reordering only ? if we want the specific instruction it needs to be enforced with -mfma ? The manual is not super clear about that because it states that this flag enables the use of fma (but does not mention instruction specific or just expression fma ...): https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffp-contract

The FMA instructions in x86 aren't available in all hardware. You need the -mfma flag (or appropriate -mcpu flags, etc.) to say that you're okay limiting compilation to only those CPUs that support the FMA instructions.

bfraboni · 2024-07-11T21:08:37Z

The documentation page states the following about -ffp-contract=on:

Specify when the compiler is permitted to form fused floating-point operations, such as fused multiply-add (FMA).

I guess what I'm trying to understand is more the difference between -ffp-contract=on FMA and -mfma FMA.
The latter as you said @jcranmer-intel is simply using the hardware specific instruction for doing FMA, I get that.
But what is the former actually doing ? What is FMA without the hardware support ? Is that just expression reordering to form fma blocks of expressions but not actually doing a true fma instruction ? Or is that using a software version of fma with better rounding ?

jcranmer-intel · 2024-07-11T21:21:46Z

-ffp-contract=on specifies that the compiler is permitted (but not required) to turn a floating-point expression (a * b) + c into fma(a, b, c).

Within the compiler, if it sees a (a * b) + c expression, and -ffp-contract=on is enabled, it turns around and asks the target "is it faster for you to execute fma(a, b, c) or (a * b) + c?" Where there is no hardware FMA instruction, the answer is usually the latter; but where there is one, it is almost always the former. Based on the answer to the question, it chooses whether or not to transform (a * b) + c into fma(a, b, c).

Note that it is possible to make an fma operation even without hardware FMA support (__builtin_fma(a, b, c) is the way you do this in clang), and this will get lowered to a library function named fma.

In short, there's three decisions going on here:

Am I allowed to transform (a * b) + c -> fma(a, b, c) (this is controlled by -ffp-contract=on)
Do I lower fma(a, b, c) to a function call or a hardware instruction (this is what -mfma is doing)
Is it faster to transform (a * b) + c -> fma(a, b, c) (which is almost always implied by the answer to the previous question)

bfraboni · 2024-07-11T21:36:53Z

Thank you @jcranmer-intel for the detailed answer, that makes more sense now !

bfraboni · 2024-07-11T22:45:35Z

Back to the issue, I'm still concerned about this backend VS frontend discrepancy. I applied a "make constexpr everything I can" policy lately for performance reasons, but knowing that operations are not enforced to be the same makes me doubt robustness now.

I agree with @AaronBallman saying that clang should be complying with recommendations here. It is not advertised / warned anywhere that constexpr can return inconsistent results with floating points and I think operations should try to respect the flags they are compiled with, otherwise it makes constexpr fp a lot less useful.

There is maybe another good reason but even when I only use constexpr I can get inconsistent results in compile time when evaluating the exact same line of code inside or outside a function: https://godbolt.org/z/vTd5PWexz
So there is definitely something wrong with -ffp-contract=on and constexpr evaluation. Could you please just double check that there isn't just a simple bug with fma used where it shouldn't ?

bfraboni · 2024-07-11T23:38:02Z

seems that the compile time evaluation always use fma even when it is not forced with -mfma.

Correct.

Not consistently @arsenm , see the above godbolt repro , the compile time eval doesn't match all the time ☝️

arsenm · 2024-07-12T05:22:12Z

Not consistently @arsenm , see the above godbolt repro , the compile time eval doesn't match all the time ☝️

This is the behavior for cases where llvm.fmuladd is emitted, which is always treated as FMA. I don't know what clang is doing in the constexpr evaluation case

jcranmer-intel · 2024-07-12T15:44:50Z

I agree with @AaronBallman saying that clang should be complying with recommendations here. It is not advertised / warned anywhere that constexpr can return inconsistent results with floating points and I think operations should try to respect the flags they are compiled with, otherwise it makes constexpr fp a lot less useful.

Fast-math flags (which include -ffp-contract=on) are an explicit signal to the compiler from the user that the user values speed over consistency in the numerical results: it gives license to the optimizer to rearrange floating-point expressions in a way that doesn't preserve exact fp results. It is effectively impossible to make the front-end always give the same answer that the optimizer gives. If you want consistent results between the constexpr evaluation and the optimizer, then stick with precise floating-point compilation modes and don't use any fast-math flags.

bfraboni · 2024-07-12T16:14:56Z

I get it now that for frontend vs backend it is not simple to give the same answer all the time.

However, I still have no clue why the frontend does not always give the same answer for the same constexpr line of code, see : https://godbolt.org/z/vTd5PWexz

bfraboni · 2024-07-12T16:46:02Z

Ok I found out, the function call inside the printf does not enforce constexpr result, so it is probably evaluated with a different code path. If I assign the result to a constexpr variable first, I get the same result. That's tricky.

Thanks all for your insights @arsenm @jcranmer-intel @AaronBallman , I know better how things work now .

Closing this one!

github-actions bot added the new issue label Jul 9, 2024

AaronBallman added clang:frontend Language frontend issues, e.g. anything involving "Sema" floating-point Floating-point math constexpr Anything related to constant evaluation and removed new issue labels Jul 11, 2024

bfraboni closed this as completed Jul 12, 2024

EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile time evaluation doesn't match runtime evaluation with -ffp-contract=on #98197

Compile time evaluation doesn't match runtime evaluation with -ffp-contract=on #98197

bfraboni commented Jul 9, 2024

llvmbot commented Jul 11, 2024

arsenm commented Jul 11, 2024

AaronBallman commented Jul 11, 2024

jcranmer-intel commented Jul 11, 2024

bfraboni commented Jul 11, 2024

jcranmer-intel commented Jul 11, 2024

bfraboni commented Jul 11, 2024 •

edited

Loading

jcranmer-intel commented Jul 11, 2024

bfraboni commented Jul 11, 2024

bfraboni commented Jul 11, 2024

bfraboni commented Jul 11, 2024 •

edited

Loading

arsenm commented Jul 12, 2024

jcranmer-intel commented Jul 12, 2024

bfraboni commented Jul 12, 2024

bfraboni commented Jul 12, 2024

Compile time evaluation doesn't match runtime evaluation with -ffp-contract=on #98197

Compile time evaluation doesn't match runtime evaluation with -ffp-contract=on #98197

Comments

bfraboni commented Jul 9, 2024

llvmbot commented Jul 11, 2024

arsenm commented Jul 11, 2024

AaronBallman commented Jul 11, 2024

jcranmer-intel commented Jul 11, 2024

bfraboni commented Jul 11, 2024

jcranmer-intel commented Jul 11, 2024

bfraboni commented Jul 11, 2024 • edited Loading

jcranmer-intel commented Jul 11, 2024

bfraboni commented Jul 11, 2024

bfraboni commented Jul 11, 2024

bfraboni commented Jul 11, 2024 • edited Loading

arsenm commented Jul 12, 2024

jcranmer-intel commented Jul 12, 2024

bfraboni commented Jul 12, 2024

bfraboni commented Jul 12, 2024

bfraboni commented Jul 11, 2024 •

edited

Loading

bfraboni commented Jul 11, 2024 •

edited

Loading