Relationship between SSE vectorization and Memory alignment

问题

Why do we need aligned memory for SSE/AVX?

One of the answer I often get is aligned memory load is much faster than unaligned memory load. Then, why is this aligned memory load is much faster than unaligned memory load?

回答1:

This is not just specific to SSE (or even x86). On most architectures loads and stores need to be naturally aligned otherwise they either (a) generate an exception or (b) need two or more cycles plus some fix up in order to handle the misaligned load/store transparently. On x86 (b) is true for data types < 16 bytes but (a) is true for SSE data types unless you explicitly use misaligned versions of the load/store instructions which can handle misaligned data.

You might wonder: why not just use the misaligned versions of these SSE load/store instructions regardless of alignment? The answer is that these instructions are typically much slower than their aligned counterparts as they generally behave as per (b) above, which makes them typically 2x or more slower, apart from recent Intel CPUs such as Core i7, where the penalty is much smaller, but not insignificant.

来源：https://stackoverflow.com/questions/14823482/relationship-between-sse-vectorization-and-memory-alignment

标签

sse

simd

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!