Visual C++ ~ Not inlining simple const function pointer calls

徘徊边缘 提交于 2019-12-01 03:38:07

The problem isn't with inlining, which the compiler does at every opportunity. The problem is that Visual C++ doesn't seem to realize that the pointer variable is actually a compile-time constant.

Test-case:

// function_pointer_resolution.cpp : Defines the entry point for the console application.
//

extern void show_int( int );

extern "C" typedef int binary_int_func( int, int );

extern "C" binary_int_func sum;
extern "C" binary_int_func* const sum_ptr = sum;

inline int call( binary_int_func* binary, int a, int b ) { return (*binary)(a, b); }

template< binary_int_func* binary >
inline int callt( int a, int b ) { return (*binary)(a, b); }

int main( void )
{
    show_int( sum(1, 2) );
    show_int( call(&sum, 3, 4) );
    show_int( callt<&sum>(5, 6) );
    show_int( (*sum_ptr)(1, 7) );
    show_int( call(sum_ptr, 3, 8) );
//  show_int( callt<sum_ptr>(5, 9) );
    return 0;
}

// sum.cpp
extern "C" int sum( int x, int y )
{
    return x + y;
}

// show_int.cpp
#include <iostream>

void show_int( int n )
{
    std::cout << n << std::endl;
}

The functions are separated into multiple compilation units to give better control over inlining. Specifically, I don't want show_int inlined, since it makes the assembly code messy.

The first whiff of trouble is that valid code (the commented line) is rejected by Visual C++. G++ has no problem with it, but Visual C++ complains "expected compile-time constant expression". This is actually a good predictor of all future behavior.

With optimization enabled and normal compilation semantics (no cross-module inlining), the compiler generates:

_main   PROC                        ; COMDAT

; 18   :    show_int( sum(1, 2) );

    push    2
    push    1
    call    _sum
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 19   :    show_int( call(&sum, 3, 4) );

    push    4
    push    3
    call    _sum
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 20   :    show_int( callt<&sum>(5, 6) );

    push    6
    push    5
    call    _sum
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 21   :    show_int( (*sum_ptr)(1, 7) );

    push    7
    push    1
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 22   :    show_int( call(sum_ptr, 3, 8) );

    push    8
    push    3
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int
    add esp, 60                 ; 0000003cH

; 23   :    //show_int( callt<sum_ptr>(5, 9) );
; 24   :    return 0;

    xor eax, eax

; 25   : }

    ret 0
_main   ENDP

There's already a huge difference between using sum_ptr and not using sum_ptr. Statements using sum_ptr generate a indirect function call call DWORD PTR _sum_ptr while all other statements generate a direct function call call _sum, even when the source code used a function pointer.

If we now enable inlining by compiling function_pointer_resolution.cpp and sum.cpp with /GL and linking with /LTCG, we find that the compiler inlines all direct calls. Indirect calls stay as-is.

_main   PROC                        ; COMDAT

; 18   :    show_int( sum(1, 2) );

    push    3
    call    ?show_int@@YAXH@Z           ; show_int

; 19   :    show_int( call(&sum, 3, 4) );

    push    7
    call    ?show_int@@YAXH@Z           ; show_int

; 20   :    show_int( callt<&sum>(5, 6) );

    push    11                  ; 0000000bH
    call    ?show_int@@YAXH@Z           ; show_int

; 21   :    show_int( (*sum_ptr)(1, 7) );

    push    7
    push    1
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 22   :    show_int( call(sum_ptr, 3, 8) );

    push    8
    push    3
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int
    add esp, 36                 ; 00000024H

; 23   :    //show_int( callt<sum_ptr>(5, 9) );
; 24   :    return 0;

    xor eax, eax

; 25   : }

    ret 0
_main   ENDP

Bottom-line: Yes, the compiler does inline calls made through a compile-time constant function pointer, as long as that function pointer is not read from a variable. This use of a function pointer got optimized:

call(&sum, 3, 4);

but this did not:

(*sum_ptr)(1, 7);

All tests run with Visual C++ 2010 Service Pack 1, compiling for x86, hosted on x64.

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86

I think that you're right in this conclusion: "... cannot inline function pointers at all".

This very simple example also breaks optimization:

static inline
int add(int x, int y)
{
    return x + y;
}

int main()
{
    int x = 3;
    int y = 2;
    auto q = add;
    int z = q(x, y);
    return z;
}

Your sample is even more complex for the compiler, so it is not surprising.

You can try __forceinline. Nobody is going to be able to tell you exactly why it isn't inlined. Common sense says to me that it should be, however. /O2 should favor code speed over code size (inlining)... Strange.

This is not a real answer, but a "maybe workaround" one: STL from Microsoft once mentioned that lambdas are more easily inlineable than f ptrs so you could try that.

As a trivia Bjarne often mentions that sort is faster thatn qsort because qsort takes function ptr, but like other people have noted gcc has no problems inlining them... so maybe Bjarne should try gcc :P

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!