In what cases should I use memcpy over standard operators in C++?

前端 未结 7 1644
梦毁少年i
梦毁少年i 2020-12-13 06:36

When can I get better performance using memcpy or how do I benefit from using it? For example:

float a[3]; float b[3];

is cod

相关标签:
7条回答
  • 2020-12-13 06:50

    Supposedly, as Nawaz said, the assignment version should be faster on most platform. That's because memcpy() will copy byte by byte while the second version could copy 4 bytes at a time.

    As it's always the case, you should always profile applications to be sure that what you expect to be the bottleneck matches the reality.

    Edit
    Same applies to dynamic array. Since you mention C++ you should use std::copy() algorithm in that case.

    Edit
    This is code output for Windows XP with GCC 4.5.0, compiled with -O3 flag:

    extern "C" void cpy(float* d, float* s, size_t n)
    {
        memcpy(d, s, sizeof(float)*n);
    }
    

    I have done this function because OP specified dynamic arrays too.

    Output assembly is the following:

    _cpy:
    LFB393:
        pushl   %ebp
    LCFI0:
        movl    %esp, %ebp
    LCFI1:
        pushl   %edi
    LCFI2:
        pushl   %esi
    LCFI3:
        movl    8(%ebp), %eax
        movl    12(%ebp), %esi
        movl    16(%ebp), %ecx
        sall    $2, %ecx
        movl    %eax, %edi
        rep movsb
        popl    %esi
    LCFI4:
        popl    %edi
    LCFI5:
        leave
    LCFI6:
        ret
    

    of course, I assume all of the experts here knows what rep movsb means.

    This is the assignment version:

    extern "C" void cpy2(float* d, float* s, size_t n)
    {
        while (n > 0) {
            d[n] = s[n];
            n--;
        }
    }
    

    which yields the following code:

    _cpy2:
    LFB394:
        pushl   %ebp
    LCFI7:
        movl    %esp, %ebp
    LCFI8:
        pushl   %ebx
    LCFI9:
        movl    8(%ebp), %ebx
        movl    12(%ebp), %ecx
        movl    16(%ebp), %eax
        testl   %eax, %eax
        je  L2
        .p2align 2,,3
    L5:
        movl    (%ecx,%eax,4), %edx
        movl    %edx, (%ebx,%eax,4)
        decl    %eax
        jne L5
    L2:
        popl    %ebx
    LCFI10:
        leave
    LCFI11:
        ret
    

    Which moves 4 bytes at a time.

    0 讨论(0)
  • 2020-12-13 06:58

    You can use memcpy only if the objects you're copying have no explicit constructors, so as their members (so-called POD, "Plain Old Data"). So it is OK to call memcpy for float, but it is wrong for, e.g., std::string.

    But part of the work has already been done for you: std::copy from <algorithm> is specialized for built-in types (and possibly for every other POD-type - depends on STL implementation). So writing std::copy(a, a + 3, b) is as fast (after compiler optimization) as memcpy, but is less error-prone.

    0 讨论(0)
  • 2020-12-13 06:58

    Use std::copy(). As the header file for g++ notes:

    This inline function will boil down to a call to @c memmove whenever possible.

    Probably, Visual Studio's is not much different. Go with the normal way, and optimize once you're aware of a bottle neck. In the case of a simple copy, the compiler is probably already optimizing for you.

    0 讨论(0)
  • 2020-12-13 06:59

    Efficiency should not be your concern.
    Write clean maintainable code.

    It bothers me that so many answers indicate that the memcpy() is inefficient. It is designed to be the most efficient way of copy blocks of memory (for C programs).

    So I wrote the following as a test:

    #include <algorithm>
    
    extern float a[3];
    extern float b[3];
    extern void base();
    
    int main()
    {
        base();
    
    #if defined(M1)
        a[0] = b[0];
        a[1] = b[1];
        a[2] = b[2];
    #elif defined(M2)
        memcpy(a, b, 3*sizeof(float));    
    #elif defined(M3)
        std::copy(&a[0], &a[3], &b[0]);
     #endif
    
        base();
    }
    

    Then to compare the code produces:

    g++ -O3 -S xr.cpp -o s0.s
    g++ -O3 -S xr.cpp -o s1.s -DM1
    g++ -O3 -S xr.cpp -o s2.s -DM2
    g++ -O3 -S xr.cpp -o s3.s -DM3
    
    echo "=======" >  D
    diff s0.s s1.s >> D
    echo "=======" >> D
    diff s0.s s2.s >> D
    echo "=======" >> D
    diff s0.s s3.s >> D
    

    This resulted in: (comments added by hand)

    =======   // Copy by hand
    10a11,18
    >   movq    _a@GOTPCREL(%rip), %rcx
    >   movq    _b@GOTPCREL(%rip), %rdx
    >   movl    (%rdx), %eax
    >   movl    %eax, (%rcx)
    >   movl    4(%rdx), %eax
    >   movl    %eax, 4(%rcx)
    >   movl    8(%rdx), %eax
    >   movl    %eax, 8(%rcx)
    
    =======    // memcpy()
    10a11,16
    >   movq    _a@GOTPCREL(%rip), %rcx
    >   movq    _b@GOTPCREL(%rip), %rdx
    >   movq    (%rdx), %rax
    >   movq    %rax, (%rcx)
    >   movl    8(%rdx), %eax
    >   movl    %eax, 8(%rcx)
    
    =======    // std::copy()
    10a11,14
    >   movq    _a@GOTPCREL(%rip), %rsi
    >   movl    $12, %edx
    >   movq    _b@GOTPCREL(%rip), %rdi
    >   call    _memmove
    

    Added Timing results for running the above inside a loop of 1000000000.

       g++ -c -O3 -DM1 X.cpp
       g++ -O3 X.o base.o -o m1
       g++ -c -O3 -DM2 X.cpp
       g++ -O3 X.o base.o -o m2
       g++ -c -O3 -DM3 X.cpp
       g++ -O3 X.o base.o -o m3
       time ./m1
    
       real 0m2.486s
       user 0m2.478s
       sys  0m0.005s
       time ./m2
    
       real 0m1.859s
       user 0m1.853s
       sys  0m0.004s
       time ./m3
    
       real 0m1.858s
       user 0m1.851s
       sys  0m0.006s
    
    0 讨论(0)
  • 2020-12-13 07:03

    The benefits of memcpy? Probably readability. Otherwise, you would have to either do a number of assignments or have a for loop for copying, neither of which are as simple and clear as just doing memcpy (of course, as long as your types are simple and don't require construction/destruction).

    Also, memcpy is generally relatively optimized for specific platforms, to the point that it won't be all that much slower than simple assignment, and may even be faster.

    0 讨论(0)
  • 2020-12-13 07:06

    Don't go for premature micro-optimisations such as using memcpy like this. Using assignment is clearer and less error-prone and any decent compiler will generate suitably efficient code. If, and only if, you have profiled the code and found the assignments to be a significant bottleneck then you can consider some kind of micro-optimisation, but in general you should always write clear, robust code in the first instance.

    0 讨论(0)
提交回复
热议问题