Skip to content

Floating point on .NET

Alexander Kopetsch edited this page Sep 4, 2018 · 2 revisions

Robust Floating-Point Arithmetic on .NET

Overview

Our aim is to implement the robust geometric primitives of Shewchuk in C# for use on .NET 4. (The public domain C source that implements this algorithm is available as predicates.c).

The main reference for the discussion here is this paper (which I'll refer to as 'the Shewchuk paper)

Jonathan Richard Shewchuk, Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates, Discrete & Computational Geometry 18:305-363, 1997. Also Technical Report CMU-CS-96-140, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, May 1996. PostScript (676k, 53 pages).

Shewchuk implements robust floating-point arithmetic by an exact adaptive-precision algorithm. The correctness of the implementation depends on how the IEEE 754 arithmetic is realised on the target machine. Our goal of implementing this algorithm in C# for use on .NET 4 will depend both on the ECMA (and ISO) specification of the CLI runtime, and on the particular implementation of the CLI specification provided by the .NET 4 runtime.

We find that the ECMA CLI specification allows a runtime implementation so much leeway in the realisation of floating-point arithmetic, that the Shewchuk algorithm cannot be (easily) implemented to work correctly on all conforming implementations of the ECMA CLI specification. However, the .NET 4 runtime does default to appropriate settings for the x86 floating-point unit, allowing an implementation of the Shewchuk algorithm to work correctly for the 64-bit floating-point type under the default .NET runtime environment.

From this perspective, we provide here a discussion of floating-point arithmetic in .NET, with references and details of some tests that exhibit the edge-case behaviour.

Concerns for Implementation on .NET

Precision of the Floating-Point Unit

This is our main concern in this discussion. We need to understand how to make a correct implementation of the robust arithmetic for the .NET runtime. As we see below, correctness of the implementation depends on the floating-point characteristics of the implementation.

The Shewchuk paper offers this warning (p. 2):

These algorithms [...] work under the assumption that hardware arithmetic is performed in radix two with exact rounding. This assumption holds on processors compliant with the IEEE 754 floating-point standard.

The concerns are discussed in more detail in Section 5 (Caveats) of the Shewchuk paper.

Even floating­-point units that use binary arithmetic with exact rounding, including those that conform to the IEEE 754 standard, can have subtle properties that undermine the assumptions of the algorithms. The most common such difficulty is the presence of extended precision internal floating­-point registers, such as those on the Intel 80486 and Pentium processors. While such registers usually improve the stability of floating-­point calculations, they cause the methods described herein for determining the roundoff of an operation to fail. There are several possible workarounds for this problem. In C, it is possible to designate variables as volatile, implying that they must be stored to memory. This ensures that the variable is rounded to a p­bit significand before it is used in another operation. Forcing intermediate values to be stored to memory and reloaded can slow down the algorithms significantly, and there is a worse consequence. Even a volatile variable could be doubly rounded, being rounded once to the internal extended precision format, then rounded again to single or double precision when it is stored to memory. The result after double rounding is not always the same as it would be if it had been correctly rounded to the final precision, and Priest [...] describes a case wherein the roundoff error produced by double rounding may not be expressible in p bits. [...] A better solution is to configure one's processor to round internally to double precision.

We have this further warning on the website:

Warning: This code will not work correctly on a processor that uses extended precision internal floating-point registers, such as the Intel 80486 and Pentium, unless the processor is configured to round internally to double precision. See the Predicates on PCs page for details.

In the predicates.c code, we have as additional comment this bit on using the volatile keyword:

/* On some machines, the exact arithmetic routines might be defeated by the  */
/*   use of internal extended precision floating-point registers.  Sometimes */
/*   this problem can be fixed by defining certain values to be volatile,    */
/*   thus forcing them to be stored to memory and rounded off.  This isn't   */
/*   a great solution, though, as it slows the arithmetic down.              */

While this option can be emulated in the .NET code, and is explicitly described in the CLR specification, as stated in the Caveat section of the Shewchuk paper, it does not lead to a correct implementation of the primitives. This is due to the double-rounding issue which we also explore in the Tests section below, whereby a calculation done at higher precision (and rounded), and then stored into lower-precision (and rounded a second time), might have a different value to the computation done at the lower precision (and rounded only once).

Compiler Optimization

An additional concern would be aggressive (and incorrect) optimisation that the compiler (either the C# compiler or the .NET JIT compiler) might do to simplify arithmetic in a way that does not respect the floating-point behaviour. For example, a compiler might be tempted to simplify the expression x = a+b-a to just x = b. This would be incorrect due to our expectation of the floating-point rounding.

ECMA CLI Specification

The reference for this section is the ECMA-335 Common Language Infrastructure (CLI) specification, 5th edition / December 2010.

The relevant part of the specification is on page 81 of Partition I:

Storage locations for floating-point numbers (statics, array elements, and fields of classes) are of fixed size. The supported storage sizes are float32 and float64. Everywhere else (on the evaluation stack, as arguments, as return types, and as local variables) floating-point numbers are represented using an internal floating-point type. In each such instance, the nominal type of the variable or expression is either float32or float64, but its value can be represented internally with additional range and/or precision. The size of the internal floating-point representation is implementation-dependent, can vary, and shall have precision at least as great as that of the variable or expression being represented. An implicit widening conversion to the internal representation from float32 or float64 is performed when those types are loaded from storage. The internal representation is typically the native size for the hardware, or as required for efficient implementation of an operation. The internal representation shall have the following characteristics: The internal representation shall have precision and range greater than or equal to the nominal type.

Conversions to and from the internal representation shall preserve value.

[Note: This implies that an implicit widening conversion from float32 (or float64) to the internal representation, followed by an explicit conversion from the internal representation to float32 (or float64), will result in a value that is identical to the original float32 (or float64) value. end note]

[Rationale: This design allows the CLI to choose a platform-specific high-performance representation for floating-point numbers until they are placed in storage locations. For example, it might be able to leave floating-point variables in hardware registers that provide more precision than a user has requested. At the same time, CIL generators can force operations to respect language-specific rules for representations through the use of conversion instructions. end rationale]

When a floating-point value whose internal representation has greater range and/or precision than its nominal type is put in a storage location, it is automatically coerced to the type of the storage location. This can involve a loss of precision or the creation of an out-of-range value (NaN, +infinity, or -infinity). However, the value might be retained in the internal representation for future use, if it is reloaded from the storage location without having been modified. It is the responsibility of the compiler to ensure that the retained value is still valid at the time of a subsequent load, taking into account the effects of aliasing and other execution threads (see memory model (§12.6)). This freedom to carry extra precision is not permitted, however, following the execution of an explicit conversion (conv.r4 or conv.r8), at which time the internal representation must be exactly representable in the associated type.

[Note: To detect values that cannot be converted to a particular storage type, a conversion instruction (conv.r4, or conv.r8) can be used, followed by a check for a non-finite value using ckfinite. Underflow can be detected by converting to a particular storage type, comparing to zero before and after the conversion. end note]

[Note: The use of an internal representation that is wider than float32 or float64 can cause differences in computational results when a developer makes seemingly unrelated modifications to their code, the result of which can be that a value is spilled from the internal representation (e.g., in a register) to a location on the stack. end note]

The allows floating-point arithmetic operations to be computed at a higher precision than the representation precision to the floating point type. In particular this means that arithmetic with a Float64 type may be computed at the 80-bit (64-bit significand) extended precision FPU setting, defeating our robust arithmetic implementation.

However, we find that the .NET runtime sets the FPU unit to 64-bit (53-bit significand) precision, ?presumably? during initialization. When run in another process hose (e.g. Excel) the setting might change.

Rounding

The rounding mode defined in IEC 60559:1989 shall be set by the CLI to "round to the nearest number," and neither the CIL nor the class library provide a mechanism for modifying this setting.

(Note that IEC 60559:1989 is basically the same as IEEE 754-1985?)

This definition of the rounding for CLI runtime does not specify whether the rounding mode is Round to nearest, ties to even or Round to nearest, ties away from zero. But the x86 FPU control state is set to Ties to Even in practice. See the Tests section below.

Also note that Intels floating-point library targeting the SSE2 instructions don't support roundTiesToAway, which is also not required by the IEEE 754-2008 standard that it implements.

Intel IA32 (x86) Architecture FPU

Wikipedia reference for floating-point standards: http://en.wikipedia.org/wiki/IEEE_754

The x86 FPU allows a number of control flags to be set, affecting the FPU behaviour. Among these are two of particular importance to us:

  1. Rounding behaviour
  2. Precision

Extended exponent range

While precision control setting for the Intel x86 FPU allows us the proper rounding behaviour, there is still one aspect where the Intel x86 FPU does not implement true 64-bit floating point arithmetic. The FPU allows extended range for the exponents, hence might not overflow or underflow some computations which would overflow/underflow otherwise.

More nice info about different VC++ compiler versions, and the x64 case, here: http://www.altdevblogaday.com/2012/03/22/intermediate-floating-point-precision/.

The Shewchuk predicates do not take over/underflow into account, so the extended exponent range might lead to cases where code running on the FPU might work, but fail when running on x64.

Floating-Point in the .NET Framework 4

The default setting for the FPU control is

From this discussion on .NET and Mono floating point (hardly an authoritative reference - can we find a better one?): http://mono.1490590.n4.nabble.com/More-NET-and-mono-floating-point-inconsistencies-td1505336.html

[...] both Mono and MS.NET configure the FPU to round each arithmetic operations to 64-bits.

Tests

The RobustGeometry.Test project contains the FpuControl class to enable tests of the settings active in the CLR, and allows exploring the issues raised above.

Rounding Mode

The option Round to nearest, ties to even is active when the CLR is running. - If this explcitly set or just happens to be the default?

Double Rounding

The CLI specification ensures that an explicit cast will perform a rounding to the right precision. However, double rounding (once to the extended precision, then explicitly to the native precision) does not always result in the same answer. Some tests in the code exhibit this issue.

This means that explicit casting cannot ensure that the floating-point arithmetic works exactly as expected. Adding such casts to the code would just impose a performance penalty with unclear benefit.

A few more .NET floating-point notes

  • Machine epsilon vs. System.Double.Epsilon. This must have been a mistake in the original .NET library implementation. System.Double.Epsilon represents the smallest double larger than 0, a value that does not seem useful. Many numerical algorithms depend in the machine epsilon, which is the smallest double that, when added to 1.0, gives a double not equal to 1.0. When converting code to C#, never use System.Double.Epsilon.

  • Math.Round rounds to even.

  • Checked interger arithmetic is a compiler flag that is on by default for VB.NET projects, and off by default for C# projects.

Conclusion - Towards Robust Arithmetic on .NET

Althought the CLI specification allows double arithmetic to be evaluated in the 80-bit x84 FPU mode, in practise (and according to some comments) the x86 FPU mode is set to 64-bit mode (53-bit precision). If the CPU is running in this mode, the floating-point implementation and the CLI sepcification are compatible with the Shewchuk robust aritmetic routines. It might add additional safety to add explicit FPU control mode checks guarding the geometric primitive routines, but this might interfere with poratbility of the library.