Are word-aligned loads faster than unaligned loads on x64 processors?

后端 未结 5 1774
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-12 06:57

Are loads of variables that are aligned on word boundaries faster than unaligned load operations on x86/64 (Intel/AMD 64 bit) processors?

A colleague of mine argues

5条回答
  •  梦谈多话
    2021-01-12 07:32

    Aligned loads are stores are faster, two excerpts from the Intel Optimization Manual cleanly point this out:

    3.6 OPTIMIZING MEMORY ACCESSES

    Align data, paying attention to data layout and stack alignment

    ...

    Alignment and forwarding problems are among the most common sources of large delays on processors based on Intel NetBurst microarchitecture.

    AND

    3.6.4 Alignment

    Alignment of data concerns all kinds of variables:

    • Dynamically allocated variables

    • Members of a data structure

    • Global or local variables

    • Parameters passed on the stack

    Misaligned data access can incur significant performance penalties. This is particularly true for cache line splits.

    Following that part in 3.6.4, there is a nice rule for compiler developers:

    Assembly/Compiler Coding Rule 45. (H impact, H generality) Align data on natural operand size address boundaries. If the data will be accessed with vector instruction loads and stores, align the data on 16-byte boundaries.

    followed by a listing of alignment rules and another gem in 3.6.6

    User/Source Coding Rule 6. (H impact, M generality) Pad data structures defined in the source code so that every data element is aligned to a natural operand size address boundary.

    Both rules are marked as high impact, meaning they can greatly change performance, along with the excerpts, the rest of Section 3.6 is filled with other reasons to naturally align your data. Its well worth any developers time to read these manuals, if only to understand the hardware he/she is working on.

提交回复
热议问题