Calculate the Fibonacci number (recursive approach) in compile time (constexpr) in C++11

前端 未结 4 907
耶瑟儿~
耶瑟儿~ 2020-12-09 06:33

I wrote the program Fibonacci number calculation in compile time (constexpr) problem using the template metaprogramming techniques supported in C++11. The purpose of this i

4条回答
  •  南笙
    南笙 (楼主)
    2020-12-09 07:08

    Adding -O1 (or higher) to GCC4.8.1 will make fibonacci<40>() a compile time constant and all the template generated code will disappear from your assembly. The following code

    int foo()
    {
      return fibonacci<40>();
    }
    

    will result in the assembly output

    foo():
        movl    $102334155, %eax
        ret
    

    This gives the best runtime performance.

    However, it looks like you are building without optimizations (-O0) so you get something quite a bit different. The assembly output for each of the 40 fibonacci functions look basically identical (except for the 0 and 1 cases)

    int fibonacci<40>():
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %rbx
        subq    $8, %rsp
        call    int fibonacci<39>()
        movl    %eax, %ebx
        call    int fibonacci<38>()
        addl    %ebx, %eax
        addq    $8, %rsp
        popq    %rbx
        popq    %rbp
        ret
    

    This is straight forward, it sets up the stack, calls the two other fibonacci functions, adds the value, tears down the stack, and returns. No branching, and no comparisons.

    Now compare that with the assembly from the conventional approach

    fibonacci(int):
        pushq   %rbp
        pushq   %rbx
        subq    $8, %rsp
        movl    %edi, %ebx
        movl    $0, %eax
        testl   %edi, %edi
        je  .L2
        movb    $1, %al
        cmpl    $1, %edi
        je  .L2
        leal    -1(%rdi), %edi
        call    fibonacci(int)
        movl    %eax, %ebp
        leal    -2(%rbx), %edi
        call    fibonacci(int)
        addl    %ebp, %eax
        .L2:
        addq    $8, %rsp
        popq    %rbx
        popq    %rbp
        ret
    

    Each time the function is called it needs to do check if N is 0 or 1 and act appropriately. This comparison is not needed in the template version because it is built into the function via the magic of templates. My guess is that the un-optimized version of the template code is faster because you avoid those comparisons and would also not have any missed branch predictions.

提交回复
热议问题