What uncommon floating-point sizes exist in C++ compilers?

后端 未结 3 529
谎友^
谎友^ 2021-01-14 17:59

The C++14 draft standard seems rather quiet about the specific requirements for float, double and long double, although these sizes seem to be common:

3条回答
  •  庸人自扰
    2021-01-14 18:29

    If you're only asking about size in bits then odd-sized types only exist in some older platforms that don't use 8-bit (or another power of 2) bytes like the Unisys ClearPath Dorado Servers with 36-bit float and 72-bit double. That beast is still even in active development until now. The last version was in 2018. Mainframes and servers live a very long life so you can still see some PDP-10 and other architectures in use in modern times, with modern compiler support

    If you care about the formats then there are lots of standard compliant 32, 64 and 128-bit floating-point formats that aren't IEEE-754 like the hex and decimal floating point types in IBM z, Cray formats and VAX formats. In fact IBM z is one of the very rare modern platforms with decimal float hardware, although if you use GCC and some other compilers you can use their built-in software support for decimal float. IBM also uses the special double-double format which is still the default for long double on PowerPC until now

    There are also some other non-standard 24-bit floats in a few modern C/C++ compilers for microcontrollers

    Here's the summary of most of the available floating-point formats. See also Do any real-world CPUs not use IEEE 754?. For more information continue to the next section


    Types in C++ are generally mapped to hardware types for performance reasons. Therefore floating-point types will be whatever available on the CPU if it ever has an FPU. In modern computers IEEE-754 is the dominant format in hardware, and due to the requirements in C++ standard float and double must be mapped to at least IEEE-754 single and double precision respectively

    Hardware support for types with higher precision is not common except on x86 and a few other rare platforms with 80-bit extended precision, therefore long double is usually mapped to the same type as double on those platforms. However recently long double is being slowly migrated to IEEE-754 quadruple precision in many compilers like GCC or Clang. Since that one is implemented with the built-in software library, performance is a lot worse. Depending on whether you favor faster execution or higher precision you're still free to choose whatever type long double maps to though. For example on x86 GCC has -mlong-double-64/80/128 and -m96/128bit-long-double options to set the padding and format of long double. The option is also available in many other architectures like the S/390 and zSeries

    PowerPC OTOH by default uses a completely different 128-bit long double format implemented using double-double arithmetic and has the same range as IEEE-754 double precision. Its precision is slightly lower than quadruple precision but it's a lot faster because it can utilize the hardware double arithmetic. As above, you can choose between the 2 formats with the -mabi=ibmlongdouble/ieeelongdouble options. That trick is also used in some platforms where only 32-bit float is supported to get near-double precision

    IBM z mainframes traditionally use IBM hex float formats and they still use it nowadays. But they do also support IEEE-754 binary and decimal floating-point types in addition to that

    The format of floating-point numbers can be either base 16 S/390® hexadecimal format, base 2 IEEE-754 binary format, or base 10 IEEE-754 decimal format. The formats are based on three operand lengths for hexadecimal and binary: short (32 bits), long (64 bits), and extended (128 bits). The formats are also based on three operand lengths for decimal: _Decimal32 (32 bits), _Decimal64 (64 bits), and _Decimal128 (128 bits).

    Floating-point numbers

    Other architectures may have other floating-point formats, like VAX or Cray. However since those mainframes are still being used, their newer hardware version also include support for IEEE-754 just like how IBM did with their mainframes

    On modern platforms without FPU the floating-point types are usually IEEE-754 single and double precision for better interoperability and library support. However on 8-bit microcontrollers even single precision is too costly, therefore some compilers support a non-standard mode where float is a 24-bit type. For example the XC8 compiler uses a 24-bit floating-point format that is a truncated form of the 32-bit format, and NXP's MRK uses a different 24-bit float format

    Due to the rise of graphics and AI applications that require a narrower floating-point type, 16-bit float formats like IEEE-754 binary16 and Google's bfloat16 are also introduced to in many platforms and compilers also have some limited support for them, like __fp16 in GCC

提交回复
热议问题