Calculate the Fibonacci number (recursive approach) in compile time (constexpr) in C++11

前端未结

关注

 4  907

耶瑟儿～ 2020-12-09 06:33

I wrote the program Fibonacci number calculation in compile time (constexpr) problem using the template metaprogramming techniques supported in C++11. The purpose of this i

4条回答

南笙 (楼主)

2020-12-09 07:08
Adding -O1 (or higher) to GCC4.8.1 will make fibonacci<40>() a compile time constant and all the template generated code will disappear from your assembly. The following code
```
int foo()
{
  return fibonacci<40>();
}
```
will result in the assembly output
```
foo():
    movl    $102334155, %eax
    ret
```
This gives the best runtime performance.

However, it looks like you are building without optimizations (-O0) so you get something quite a bit different. The assembly output for each of the 40 fibonacci functions look basically identical (except for the 0 and 1 cases)
```
int fibonacci<40>():
    pushq   %rbp
    movq    %rsp, %rbp
    pushq   %rbx
    subq    $8, %rsp
    call    int fibonacci<39>()
    movl    %eax, %ebx
    call    int fibonacci<38>()
    addl    %ebx, %eax
    addq    $8, %rsp
    popq    %rbx
    popq    %rbp
    ret
```
This is straight forward, it sets up the stack, calls the two other fibonacci functions, adds the value, tears down the stack, and returns. No branching, and no comparisons.

Now compare that with the assembly from the conventional approach
```
fibonacci(int):
    pushq   %rbp
    pushq   %rbx
    subq    $8, %rsp
    movl    %edi, %ebx
    movl    $0, %eax
    testl   %edi, %edi
    je  .L2
    movb    $1, %al
    cmpl    $1, %edi
    je  .L2
    leal    -1(%rdi), %edi
    call    fibonacci(int)
    movl    %eax, %ebp
    leal    -2(%rbx), %edi
    call    fibonacci(int)
    addl    %ebp, %eax
    .L2:
    addq    $8, %rsp
    popq    %rbx
    popq    %rbp
    ret
```
Each time the function is called it needs to do check if N is 0 or 1 and act appropriately. This comparison is not needed in the template version because it is built into the function via the magic of templates. My guess is that the un-optimized version of the template code is faster because you avoid those comparisons and would also not have any missed branch predictions.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...