Performance of x86 rep instructions on modern (pipelined/superscalar) processors

前端 未结 3 492
耶瑟儿~
耶瑟儿~ 2020-12-13 04:46

i\'ve been writing in x86 assembly lately (for fun) and was wondering whether or not rep prefixed string instructions actually have a performance edge on modern processors o

3条回答
  •  一整个雨季
    2020-12-13 05:25

    There is a lot of space given to questions like this in both AMD and Intel's optimization guides. Validity of advice given in this area has a "half life" - different CPU generations behave differently, for example:

    • AMD Software Optimization Guide (Sep/2005), section 8.3, pg. 167:
      Avoid using the REP prefix when performing string operations, especially when copying blocks of memory.
    • AMD Software Optimization Guide (Apr/2011), section 9.3, pg. 148:
      Use the REP prefix judiciously when performing string operations.

    The Intel Architecture Optimization Manual gives performance comparison figures for various block copy techniques (including rep stosd) on Table 7-2. Relative Performance of Memory Copy Routines, pg. 7-37f., for different CPUs, and again what's fastest on one might not be fastest on others.

    For many cases, recent x86 CPUs (which have the "string" SSE4.2 operations) can do string operations via the SIMD unit, see this investigation.

    To follow up on all this (and/or keep yourself updated when things change again, inevitably), read Agner Fog's Optimization guides/blogs.

提交回复
热议问题