Enhanced REP MOVSB for memcpy

前端 未结 6 1095
别跟我提以往
别跟我提以往 2020-11-22 02:04

I would like to use enhanced REP MOVSB (ERMSB) to get a high bandwidth for a custom memcpy.

ERMSB was introduced with the Ivy Bridge microarchitecture

6条回答
  •  一向
    一向 (楼主)
    2020-11-22 02:42

    There are far more efficient ways to move data. These days, the implementation of memcpy will generate architecture specific code from the compiler that is optimized based upon the memory alignment of the data and other factors. This allows better use of non-temporal cache instructions and XMM and other registers in the x86 world.

    When you hard-code rep movsb prevents this use of intrinsics.

    Therefore, for something like a memcpy, unless you are writing something that will be tied to a very specific piece of hardware and unless you are going to take the time to write a highly optimized memcpy function in assembly (or using C level intrinsics), you are far better off allowing the compiler to figure it out for you.

提交回复
热议问题