Calculate the Fibonacci number (recursive approach) in compile time (constexpr) in C++11

前端 未结 4 892
耶瑟儿~
耶瑟儿~ 2020-12-09 06:33

I wrote the program Fibonacci number calculation in compile time (constexpr) problem using the template metaprogramming techniques supported in C++11. The purpose of this i

相关标签:
4条回答
  • 2020-12-09 06:51

    Try this:

    template<size_t N>
    struct fibonacci : integral_constant<size_t, fibonacci<N-1>{} + fibonacci<N-2>{}> {};
    
    template<> struct fibonacci<1> : integral_constant<size_t,1> {};
    template<> struct fibonacci<0> : integral_constant<size_t,0> {};
    

    With clang and -Os, this compiles in roughly 0.5s and runs in zero time for N=40. Your "conventional" approach compiles in roughly 0.4s and runs in 0.8s. Just for checking, the result is 102334155 right?

    When I tried your own constexpr solution the compiler run for a couple of minutes and then I stopped it because apparently memory was full (computer started freezing). The compiler was trying to compute the final result and your implementation is extremely inefficient to be used at compile time.

    With this solution, template instantiations at N-2, N-1 are re-used when instantiating N. So fibonacci<40> is actually known at compile time as a value, and there is nothing to do at run-time. This is a dynamic programming approach and of course you can do the same at run time if you store all values at 0 through N-1 before computing at N.

    With your solution, the compiler can evaluate fibonacci<N>() at compile time but is not required to. In your case, all or part of computation is left for run time. In my case, all computation is attempted at compile time, hence never ending.

    0 讨论(0)
  • 2020-12-09 06:52

    Maybe just use a more efficient algorithm?

    constexpr pair<double, double> helper(size_t n, const pair<double, double>& g)
    {
        return n % 2
            ? make_pair(g.second * g.second + g.first * g.first, g.second * g.second + 2 * g.first * g.second)
            : make_pair(2 * g.first * g.second - g.first * g.first, g.second * g.second + g.first * g.first);
    }
    
    constexpr pair<double, double> fibonacciRecursive(size_t n)
    {
        return n < 2
            ? make_pair<double, double>(n, 1)
            : helper(n, fibonacciRecursive(n / 2));
    }
    
    constexpr double fibonacci(size_t n)
    {
        return fibonacciRecursive(n).first;
    }
    

    My code is based on an idea described by D. Knuth in the first part of his "The Art of Computer Programming". I can't remember the exact place in this book, but I'm sure that the algorithm was described there.

    0 讨论(0)
  • 2020-12-09 07:03

    The reason is that your runtime solution is not optimal. For every fib number, functions are called several times. The fibonacci sequence, has overlapping subproblems, so for example fib(6) calls fib(4), and fib(5) also calls fib(4).

    The template based approach, uses (inadvertently) a Dynamic Programming approach, meaning that it stores values for previously calculated numbers, avoiding repetition. So, when fib(5) calls fib(4), the number was already calculated when fib(6) did.

    I recommend looking up "dynamic programming fibonacci" and trying that, it should speed things up dramatically.

    0 讨论(0)
  • 2020-12-09 07:08

    Adding -O1 (or higher) to GCC4.8.1 will make fibonacci<40>() a compile time constant and all the template generated code will disappear from your assembly. The following code

    int foo()
    {
      return fibonacci<40>();
    }
    

    will result in the assembly output

    foo():
        movl    $102334155, %eax
        ret
    

    This gives the best runtime performance.

    However, it looks like you are building without optimizations (-O0) so you get something quite a bit different. The assembly output for each of the 40 fibonacci functions look basically identical (except for the 0 and 1 cases)

    int fibonacci<40>():
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %rbx
        subq    $8, %rsp
        call    int fibonacci<39>()
        movl    %eax, %ebx
        call    int fibonacci<38>()
        addl    %ebx, %eax
        addq    $8, %rsp
        popq    %rbx
        popq    %rbp
        ret
    

    This is straight forward, it sets up the stack, calls the two other fibonacci functions, adds the value, tears down the stack, and returns. No branching, and no comparisons.

    Now compare that with the assembly from the conventional approach

    fibonacci(int):
        pushq   %rbp
        pushq   %rbx
        subq    $8, %rsp
        movl    %edi, %ebx
        movl    $0, %eax
        testl   %edi, %edi
        je  .L2
        movb    $1, %al
        cmpl    $1, %edi
        je  .L2
        leal    -1(%rdi), %edi
        call    fibonacci(int)
        movl    %eax, %ebp
        leal    -2(%rbx), %edi
        call    fibonacci(int)
        addl    %ebp, %eax
        .L2:
        addq    $8, %rsp
        popq    %rbx
        popq    %rbp
        ret
    

    Each time the function is called it needs to do check if N is 0 or 1 and act appropriately. This comparison is not needed in the template version because it is built into the function via the magic of templates. My guess is that the un-optimized version of the template code is faster because you avoid those comparisons and would also not have any missed branch predictions.

    0 讨论(0)
提交回复
热议问题