Adding two floating-point numbers

前端 未结 2 1448
误落风尘
误落风尘 2020-12-06 10:04

I would like to compute the sum, rounded up, of two IEEE 754 binary64 numbers. To that end I wrote the C99 program below:

#include 
#include &         


        
2条回答
  •  無奈伤痛
    2020-12-06 10:15

    I couldn't find any command line options that would do what you wanted. However, I did find a way to rewrite your code so that even with maximum optimizations (even architectural optimizations), neither GCC nor Clang compute the value at compile time. Instead, this forces them to output code that will compute the value at runtime.

    C:

    #include 
    #include 
    
    #pragma STDC FENV_ACCESS ON
    
    // add with rounding up
    double __attribute__ ((noinline)) addrup (double x, double y) {
      int round = fegetround ();
      fesetround (FE_UPWARD);
      double r = x + y;
      fesetround (round);   // restore old rounding mode
      return r;
    }
    
    int main(int c, char *v[]){
      printf("%a\n", addrup (0x1.0p0, 0x1.0p-80));
    }
    

    This results in these outputs from GCC and Clang, even when using maximum and architectural optimizations:

    gcc -S -x c -march=corei7 -O3 (Godbolt GCC):

    addrup:
            push    rbx
            sub     rsp, 16
            movsd   QWORD PTR [rsp+8], xmm0
            movsd   QWORD PTR [rsp], xmm1
            call    fegetround
            mov     edi, 2048
            mov     ebx, eax
            call    fesetround
            movsd   xmm1, QWORD PTR [rsp]
            mov     edi, ebx
            movsd   xmm0, QWORD PTR [rsp+8]
            addsd   xmm0, xmm1
            movsd   QWORD PTR [rsp], xmm0
            call    fesetround
            movsd   xmm0, QWORD PTR [rsp]
            add     rsp, 16
            pop     rbx
            ret
    .LC2:
            .string "%a\n"
    main:
            sub     rsp, 8
            movsd   xmm1, QWORD PTR .LC0[rip]
            movsd   xmm0, QWORD PTR .LC1[rip]
            call    addrup
            mov     edi, OFFSET FLAT:.LC2
            mov     eax, 1
            call    printf
            xor     eax, eax
            add     rsp, 8
            ret
    .LC0:
            .long   0
            .long   988807168
    .LC1:
            .long   0
            .long   1072693248
    

    clang -S -x c -march=corei7 -O3 (Godbolt GCC):

    addrup:                                 # @addrup
            push    rbx
            sub     rsp, 16
            movsd   qword ptr [rsp], xmm1   # 8-byte Spill
            movsd   qword ptr [rsp + 8], xmm0 # 8-byte Spill
            call    fegetround
            mov     ebx, eax
            mov     edi, 2048
            call    fesetround
            movsd   xmm0, qword ptr [rsp + 8] # 8-byte Reload
            addsd   xmm0, qword ptr [rsp]   # 8-byte Folded Reload
            movsd   qword ptr [rsp + 8], xmm0 # 8-byte Spill
            mov     edi, ebx
            call    fesetround
            movsd   xmm0, qword ptr [rsp + 8] # 8-byte Reload
            add     rsp, 16
            pop     rbx
            ret
    
    .LCPI1_0:
            .quad   4607182418800017408     # double 1
    .LCPI1_1:
            .quad   4246894448610377728     # double 8.2718061255302767E-25
    main:                                   # @main
            push    rax
            movsd   xmm0, qword ptr [rip + .LCPI1_0] # xmm0 = mem[0],zero
            movsd   xmm1, qword ptr [rip + .LCPI1_1] # xmm1 = mem[0],zero
            call    addrup
            mov     edi, .L.str
            mov     al, 1
            call    printf
            xor     eax, eax
            pop     rcx
            ret
    
    .L.str:
            .asciz  "%a\n"
    

    Now for the more interesting part: why does that work?

    Well, when they (GCC and/or Clang) compile code, they try to find and replace values that can be computed at runtime. This is known as constant propagation. If you had simply written another function, constant propagation would cease to occur, since it isn't supposed to cross functions.

    However, if they see a function that they could, in theory, substitute the code of in place of the function call, they may do so. This is known as function inlining. If function inlining will work on a function, we say that that function is (surprise) inlinable.

    If a function always return the same results for a given set of inputs, then it is considered pure. We also say that it has no side effects (meaning it makes no changes to the environment).

    Now, if a function is fully inlinable (meaning that it doesn't make any calls to external libraries excluding a few defaults included in GCC and Clang - libc, libm, etc.) and is pure, then they will apply constant propagation to the function.

    In other words, if we don't want them to propagate constants through a function call, we can do one of two things:

    • Make the function appear impure:
      • Use the filesystem
      • Do some bullshit magic with some random input from somewhere
      • Use the network
      • Use some syscall of some sort
      • Call something from an external library unknown to GCC and/or Clang
    • Make the function not fully inlinable
      • Call something from an external library unknown to GCC and/or Clang
      • Use __attribute__ ((noinline))

    Now, that last one is the easiest. As you may have surmised, __attribute__ ((noinline)) marks the function as non-inlinable. Since we can take advantage of this, all we have to do is make another function that does whatever computation we want, mark it with __attribute__ ((noinline)), and then call it.

    When it is compiled, they will not violate the inlining and, by extension, constant propagation rules, and therefore, the value will be computed at runtime with the appropriate rounding mode set.

提交回复
热议问题