What is difference between quiet NaN and signaling NaN?

淺唱寂寞╮ 提交于 2019-11-26 18:48:42

When an operation results in a quiet NaN, there is no indication that anything is unusual until the program checks the result and sees a NaN. That is, computation continues without any signal from the floating point unit (FPU) or library if floating-point is implemented in software. A signalling NaN will produce a signal, usually in the form of exception from the FPU. Whether the exception is thrown depends on the state of the FPU.

C++11 adds a few language controls over the floating-point environment and provides standardized ways to create and test for NaNs. However, whether the controls are implemented is not well standardized and floating-point exceptions are not typically caught the same way as standard C++ exceptions.

In POSIX/Unix systems, floating point exceptions are typically caught using a handler for SIGFPE.

How qNaNs and sNaNs look like experimentally?

Let's first learn how to identify if we have an sNaN or a qNaN.

I'll be using C++ in this answer instead of C because it offers the convenient std::numeric_limits::quiet_NaN and std::numeric_limits::signaling_NaN which I could not find in C conveniently.

I could not however find a function to classify if a NaN is sNaN or qNaN, so let's just print out the NaN raw bytes:

main.cpp

#include <cassert>
#include <cstring>
#include <cmath> // nanf, isnan
#include <iostream>
#include <limits> // std::numeric_limits

#pragma STDC FENV_ACCESS ON

void print_float(float f) {
    std::uint32_t i;
    std::memcpy(&i, &f, sizeof f);
    std::cout << std::hex << i << std::endl;
}

int main() {
    static_assert(std::numeric_limits<float>::has_quiet_NaN, "");
    static_assert(std::numeric_limits<float>::has_signaling_NaN, "");
    static_assert(std::numeric_limits<float>::has_infinity, "");

    // Generate them.
    float qnan = std::numeric_limits<float>::quiet_NaN();
    float snan = std::numeric_limits<float>::signaling_NaN();
    float inf = std::numeric_limits<float>::infinity();
    float nan0 = std::nanf("0");
    float nan1 = std::nanf("1");
    float nan2 = std::nanf("2");
    float div_0_0 = 0.0f / 0.0f;
    float sqrt_negative = std::sqrt(-1.0f);

    // Print their bytes.
    std::cout << "qnan "; print_float(qnan);
    std::cout << "snan "; print_float(snan);
    std::cout << " inf "; print_float(inf);
    std::cout << "-inf "; print_float(-inf);
    std::cout << "nan0 "; print_float(nan0);
    std::cout << "nan1 "; print_float(nan1);
    std::cout << "nan2 "; print_float(nan2);
    std::cout << " 0/0 "; print_float(div_0_0);
    std::cout << "sqrt "; print_float(sqrt_negative);

    // Assert if they are NaN or not.
    assert(std::isnan(qnan));
    assert(std::isnan(snan));
    assert(!std::isnan(inf));
    assert(!std::isnan(-inf));
    assert(std::isnan(nan0));
    assert(std::isnan(nan1));
    assert(std::isnan(nan2));
    assert(std::isnan(div_0_0));
    assert(std::isnan(sqrt_negative));
}

Compile and run:

g++ -ggdb3 -O3 -std=c++11 -Wall -Wextra -pedantic -o main.out main.cpp
./main.out

output on my x86_64 machine:

qnan 7fc00000
snan 7fa00000
 inf 7f800000
-inf ff800000
nan0 7fc00000
nan1 7fc00001
nan2 7fc00002
 0/0 ffc00000
sqrt ffc00000

We can also execute the program on aarch64 with QEMU user mode:

aarch64-linux-gnu-g++ -ggdb3 -O3 -std=c++11 -Wall -Wextra -pedantic -o main.out main.cpp
qemu-aarch64 -L /usr/aarch64-linux-gnu/ main.out

and that produces the exact same output, suggesting that multiple archs closely implement IEEE 754.

At this point, if you are not familiar with the structure of IEEE 754 floating point numbers, have a look at: What is a subnormal floating point number?

In binary some of the values above are:

     31
     |
     | 30    23 22                    0
     | |      | |                     |
-----+-+------+-+---------------------+
qnan 0 11111111 10000000000000000000000
snan 0 11111111 01000000000000000000000
 inf 0 11111111 00000000000000000000000
-inf 1 11111111 00000000000000000000000
-----+-+------+-+---------------------+
     | |      | |                     |
     | +------+ +---------------------+
     |    |               |
     |    v               v
     | exponent        fraction
     |
     v
     sign

From this experiment we observe that:

  • qNaN and sNaN seem to be differentiated only by bit 22: 1 means quiet, and 0 means signaling

  • infinities are also quite similar with exponent == 0xFF, but they have fraction == 0.

    For this reason, NaNs must set bit 21 to 1, otherwise it would not be possible to distinguish sNaN from positive infinity!

  • nanf() produces several different NaNs, so there must be multiple possible encodings:

    7fc00000
    7fc00001
    7fc00002
    

    Since nan0 is the same as std::numeric_limits<float>::quiet_NaN(), we deduce that they are all different quiet NaNs.

    The C11 N1570 standard draft confirms that nanf() generates quiet NaNs, because nanf forwards to strtod and 7.22.1.3 "The strtod, strtof, and strtold functions" says:

    A character sequence NAN or NAN(n-char-sequence opt ) is interpreted as a quiet NaN, if supported in the return type, else like a subject sequence part that does not have the expected form; the meaning of the n-char sequence is implementation-defined. 293)

See also:

How qNaNs and sNaNs look like in the manuals?

IEEE 754 2008 recommends that (TODO mandatory or optional?):

  • anything with exponent == 0xFF and fraction != 0 is a NaN
  • and that the highest fraction bit differentiates qNaN from sNaN

but it does not seem to say which bit is preferred to differentiate infinity from NaN.

6.2.1 "NaN encodings in binary formats" says:

This subclause further specifies the encodings of NaNs as bit strings when they are the results of operations. When encoded, all NaNs have a sign bit and a pattern of bits necessary to identify the encoding as a NaN and which determines its kind (sNaN vs. qNaN). The remaining bits, which are in the trailing significand field, encode the payload, which might be diagnostic information (see above). 34

All binary NaN bit strings have all the bits of the biased exponent field E set to 1 (see 3.4). A quiet NaN bit string should be encoded with the first bit (d1) of the trailing significand field T being 1. A signaling NaN bit string should be encoded with the first bit of the trailing significand field being 0. If the first bit of the trailing significand field is 0, some other bit of the trailing significand field must be non-zero to distinguish the NaN from infinity. In the preferred encoding just described, a signaling NaN shall be quieted by setting d1 to 1, leaving the remaining bits of T unchanged. For binary formats, the payload is encoded in the p−2 least significant bits of the trailing significand field

The Intel 64 and IA-32 Architectures Software Developer’s Manual - Volume 1 Basic Architecture - 253665-056US September 2015 4.8.3.4 "NaNs" confirms that x86 follows IEEE 754 by distinguishing NaN and sNaN by the highest fraction bit:

The IA-32 architecture defines two classes of NaNs: quiet NaNs (QNaNs) and signaling NaNs (SNaNs). A QNaN is a NaN with the most significant fraction bit set an SNaN is a NaN with the most significant fraction bit clear.

and so does the ARM Architecture Reference Manual - ARMv8, for ARMv8-A architecture profile - DDI 0487C.a A1.4.3 "Single-precision floating-point format":

fraction != 0: The value is a NaN, and is either a quiet NaN or a signaling NaN. The two types of NaN are distinguished by their most significant fraction bit, bit[22]:

  • bit[22] == 0: The NaN is a signaling NaN. The sign bit can take any value, and the remaining fraction bits can take any value except all zeros.
  • bit[22] == 1: The NaN is a quiet NaN. The sign bit and remaining fraction bits can take any value.

How are qNanS and sNaNs generated?

One major difference between qNaNs and sNaNs is that:

  • qNaN is generated by regular built-in (software or hardware) arithmetic operations with weird values
  • sNaN is never generated by built-in operations, it can only be explicitly added by programmers, e.g. with std::numeric_limits::signaling_NaN

I could not find clear IEEE 754 or C11 quotes for that, but neither can I find any built-in operation that generates sNaNs ;-)

The Intel manual clearly states this principle however at 4.8.3.4 "NaNs":

SNaNs are typically used to trap or invoke an exception handler. They must be inserted by software; that is, the processor never generates an SNaN as a result of a floating-point operation.

This can be seen from our example where both:

float div_0_0 = 0.0f / 0.0f;
float sqrt_negative = std::sqrt(-1.0f);

produce exactly the same bits as std::numeric_limits<float>::quiet_NaN().

Both of those operations compile to a single x86 assembly instruction that generates the qNaN directly in the hardware (TODO confirm with GDB).

What do qNaNs and sNaNs do differently?

Now that we know what qNaNs and sNaNs look like, and how to manipulate them, we are finally ready to try and make sNaNs do their thing and blow some programs up!

So without further ado:

blow_up.cpp

#include <cassert>
#include <cfenv>
#include <cmath> // isnan
#include <iostream>
#include <limits> // std::numeric_limits
#include <unistd.h>

#pragma STDC FENV_ACCESS ON

int main() {
    float snan = std::numeric_limits<float>::signaling_NaN();
    float qnan = std::numeric_limits<float>::quiet_NaN();
    float f;

    // No exceptions.
    assert(std::fetestexcept(FE_ALL_EXCEPT) == 0);

    // Still no exceptions because qNaN.
    f = qnan + 1.0f;
    assert(std::isnan(f));
    if (std::fetestexcept(FE_ALL_EXCEPT) == FE_INVALID)
        std::cout << "FE_ALL_EXCEPT qnan + 1.0f" << std::endl;

    // Now we can get an exception because sNaN, but signals are disabled.
    f = snan + 1.0f;
    assert(std::isnan(f));
    if (std::fetestexcept(FE_ALL_EXCEPT) == FE_INVALID)
        std::cout << "FE_ALL_EXCEPT snan + 1.0f" << std::endl;
    feclearexcept(FE_ALL_EXCEPT);

    // And now we enable signals and blow up with SIGFPE! >:-)
    feenableexcept(FE_INVALID);
    f = qnan + 1.0f;
    std::cout << "feenableexcept qnan + 1.0f" << std::endl;
    f = snan + 1.0f;
    std::cout << "feenableexcept snan + 1.0f" << std::endl;
}

Compile, run and get the exit status:

g++ -ggdb3 -O0 -Wall -Wextra -pthread -std=c++11 -pedantic-errors -o blow_up.out blow_up.cpp -lm -lrt
./blow_up.out
echo $?

Output:

FE_ALL_EXCEPT snan + 1.0f
feenableexcept qnan + 1.0f
Floating point exception (core dumped)
136

Note that this behaviour only happens with -O0 in GCC 8.2: with -O3, GCC pre-calculates and optimizes all our sNaN operations away! I'm not sure if there is a standard compliant way of preventing that.

So we deduce from this example that:

  • snan + 1.0 causes FE_INVALID, but qnan + 1.0 does not

  • Linux only generates a signal if it is enabled with feenableexept.

    This is a glibc extension, I could not find any way to do that in any standard.

When the signal happens, it because the CPU hardware itself raises an exception, which the Linux kernel handled and informed the application through the signal.

The outcome is that bash prints Floating point exception (core dumped), and the exit status is 136, which corresponds to signal 136 - 128 == 8, which according to:

man 7 signal

is SIGFPE.

Note that SIGFPE is the same signal that we get if we try to divide an integer by 0:

int main() {
    int i = 1 / 0;
}

although for integers:

  • dividing anything by zero raises the signal, since there is no infinity representation in integers
  • the signal it happens by default, without the need for feenableexcept

How to handle the SIGFPE?

If you just create a handler that returns normally, it leads to an infinite loop, because after the handler returns, the division happens again! This can be verified with GDB.

The only way is to use setjmp and longjmp to jump somewhere else as shown at: C handle signal SIGFPE and continue execution

What are some real world applications of sNaNs?

Quite honestly, I still haven't understood a super useful use case for sNaNs, this has been asked at: Usefulness of signaling NaN?

sNaNs feel particularly useless because we can detect the initial invalid operations (0.0f/0.0f) that generate qNaNs with feenableexcept: it appears that snan just raises errors for more operations which qnan does not raise for, e.g. (qnan + 1.0f).

E.g.:

main.c

#define _GNU_SOURCE
#include <fenv.h>
#include <stdio.h>

int main(int argc, char **argv) {
    (void)argv;
    float f0 = 0.0;

    if (argc == 1) {
        feenableexcept(FE_INVALID);
    }
    float f1 = 0.0 / f0;
    printf("f1 %f\n", f1);

    feenableexcept(FE_INVALID);
    float f2 = f1 + 1.0;
    printf("f2 %f\n", f2);
}

compile:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c -lm

then:

./main.out

gives:

Floating point exception (core dumped)

and:

./main.out  1

gives:

f1 -nan
f2 -nan

See also: How to trace a NaN in C++

What are the signal flags and how are they manipulated?

Everything is implemented in the CPU hardware.

The flags live in some register, and so does the bit that says if an exception / signal should be raised.

Those registers are accessible from userland from most archs.

This part of the glibc 2.29 code is actually very easy to understand!

For example, fetestexcept is implemented for x86_86 at sysdeps/x86_64/fpu/ftestexcept.c:

#include <fenv.h>

int
fetestexcept (int excepts)
{
  int temp;
  unsigned int mxscr;

  /* Get current exceptions.  */
  __asm__ ("fnstsw %0\n"
       "stmxcsr %1" : "=m" (*&temp), "=m" (*&mxscr));

  return (temp | mxscr) & excepts & FE_ALL_EXCEPT;
}
libm_hidden_def (fetestexcept)

so we immediately see that the instructions use is stmxcsr which stands for "Store MXCSR Register State".

And feenableexcept is implemented at sysdeps/x86_64/fpu/feenablxcpt.c:

#include <fenv.h>

int
feenableexcept (int excepts)
{
  unsigned short int new_exc, old_exc;
  unsigned int new;

  excepts &= FE_ALL_EXCEPT;

  /* Get the current control word of the x87 FPU.  */
  __asm__ ("fstcw %0" : "=m" (*&new_exc));

  old_exc = (~new_exc) & FE_ALL_EXCEPT;

  new_exc &= ~excepts;
  __asm__ ("fldcw %0" : : "m" (*&new_exc));

  /* And now the same for the SSE MXCSR register.  */
  __asm__ ("stmxcsr %0" : "=m" (*&new));

  /* The SSE exception masks are shifted by 7 bits.  */
  new &= ~(excepts << 7);
  __asm__ ("ldmxcsr %0" : : "m" (*&new));

  return old_exc;
}

What does the C standard say about qNaN vs sNaN?

The C11 N1570 standard draft explicitly says that the standard does not differentiate between them at F.2.1 "Infinities, signed zeros, and NaNs":

1 This specification does not define the behavior of signaling NaNs. It generally uses the term NaN to denote quiet NaNs. The NAN and INFINITY macros and the nan functions in <math.h> provide designations for IEC 60559 NaNs and infinities.

Tested in Ubuntu 18.10, GCC 8.2. GitHub upstreams:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!