发表新帖

发表新帖

Enhanced REP MOVSB for memcpy

前端未结

关注

 6  1090

别跟我提以往 2020-11-22 02:04

I would like to use enhanced REP MOVSB (ERMSB) to get a high bandwidth for a custom memcpy.

ERMSB was introduced with the Ivy Bridge microarchitecture

6条回答

自闭症患者 (楼主)

2020-11-22 02:26
As a general memcpy() guide:

a) If the data being copied is tiny (less than maybe 20 bytes) and has a fixed size, let the compiler do it. Reason: Compiler can use normal mov instructions and avoid the startup overheads.

b) If the data being copied is small (less than about 4 KiB) and is guaranteed to be aligned, use rep movsb (if ERMSB is supported) or rep movsd (if ERMSB is not supported). Reason: Using an SSE or AVX alternative has a huge amount of "startup overhead" before it copies anything.

c) If the data being copied is small (less than about 4 KiB) and is not guaranteed to be aligned, use rep movsb. Reason: Using SSE or AVX, or using rep movsd for the bulk of it plus some rep movsb at the start or end, has too much overhead.

d) For all other cases use something like this:
```
    mov edx,0
.again:
    pushad
.nextByte:
    pushad
    popad
    mov al,[esi]
    pushad
    popad
    mov [edi],al
    pushad
    popad
    inc esi
    pushad
    popad
    inc edi
    pushad
    popad
    loop .nextByte
    popad
    inc edx
    cmp edx,1000
    jb .again
```
Reason: This will be so slow that it will force programmers to find an alternative that doesn't involve copying huge globs of data; and the resulting software will be significantly faster because copying large globs of data was avoided.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题