Enhanced REP MOVSB for memcpy

前端 未结 6 1090
别跟我提以往
别跟我提以往 2020-11-22 02:04

I would like to use enhanced REP MOVSB (ERMSB) to get a high bandwidth for a custom memcpy.

ERMSB was introduced with the Ivy Bridge microarchitecture

6条回答
  •  自闭症患者
    2020-11-22 02:26

    As a general memcpy() guide:

    a) If the data being copied is tiny (less than maybe 20 bytes) and has a fixed size, let the compiler do it. Reason: Compiler can use normal mov instructions and avoid the startup overheads.

    b) If the data being copied is small (less than about 4 KiB) and is guaranteed to be aligned, use rep movsb (if ERMSB is supported) or rep movsd (if ERMSB is not supported). Reason: Using an SSE or AVX alternative has a huge amount of "startup overhead" before it copies anything.

    c) If the data being copied is small (less than about 4 KiB) and is not guaranteed to be aligned, use rep movsb. Reason: Using SSE or AVX, or using rep movsd for the bulk of it plus some rep movsb at the start or end, has too much overhead.

    d) For all other cases use something like this:

        mov edx,0
    .again:
        pushad
    .nextByte:
        pushad
        popad
        mov al,[esi]
        pushad
        popad
        mov [edi],al
        pushad
        popad
        inc esi
        pushad
        popad
        inc edi
        pushad
        popad
        loop .nextByte
        popad
        inc edx
        cmp edx,1000
        jb .again
    

    Reason: This will be so slow that it will force programmers to find an alternative that doesn't involve copying huge globs of data; and the resulting software will be significantly faster because copying large globs of data was avoided.

提交回复
热议问题