How does the internal implementation of memcpy work?

后端 未结 3 1381
执念已碎
执念已碎 2021-01-01 13:10

How does the standard C function \'memcpy\' work? It has to copy a (large) chunk of RAM to another area in the RAM. Since I know you cannot move straight from RAM to RAM in

相关标签:
3条回答
  • 2021-01-01 13:47

    The implementation of memcpy is highly specific to the system in which it is implemented. Implementations are often hardware-assisted.

    Memory-to-memory mov instructions are not that uncommon - they have been around since at least PDP-11 times, when you could write something like this:

        MOV FROM, R2
        MOV TO,   R3
        MOV R2,   R4
        ADD LEN,  R4
    CP: MOV (R2+), (R3+) ; "(Rx+)" means "*Rx++" in C
        CMP R2, R4
        BNE CP
    

    The commented line is roughly equivalent to C's

    *to++ = *from++;
    

    Contemporary CPUs have instructions that implement memcpy directly: you load special registers with the source and destination addresses, invoke a memory copy command, and let CPU do the rest.

    0 讨论(0)
  • 2021-01-01 13:54

    Depends. In general, you couldn't physically copy anything larger than the largest usable register in a single cycle, but that's not really how machines work these days. In practice, you really care less about what the CPU is doing and more about the characteristics of DRAM. The memory hierarchy of the machine is going to play a crucial determining role in performing this copy in the fastest possible manner (e.g., are you loading whole cache-lines? What's the size of a DRAM row with respect to the copy operation?). An implementation might instead choose to use some kind of vector instructions to implement memcpy. Without reference to a specific implementation, it's effectively a byte-for-byte copy with a one-place buffer.

    Here's a fun article that describes one person's adventure into optimizing memcpy. The main take-home point is that it is always going to be targeted to a specific architecture and environment based on the instructions you can execute inexpensively.

    0 讨论(0)
  • 2021-01-01 13:56

    A trivial implementation of memcpy is:

     while (n--) *s2++ = *s1++;
    

    But glibc usually uses some clever implementations in assembly code. memcpy calls are usually inlined.

    On x86, the code checks if the size parameter is a literal multiple of 2 or a multiple of 4 (using gcc builtins functions) and uses a loop with movl instruction (copy 4 bytes) otherwise it calls the general case.

    The general case uses the fast block copy assembly using rep and movsl instructions.

    0 讨论(0)
提交回复
热议问题