Hint to compiler that it can use aligned memcpy

问题

I have a struct consisting of seven __m256 values, which is stored 32-byte aligned in memory.

typedef struct
{
        __m256 xl,xh;
        __m256 yl,yh;
        __m256 zl,zh;
        __m256i co;
} bloxset8_t;

I achieve the 32-byte alignment by using the posix_memalign() function for dynamically allocated data, or using the (aligned(32)) attribute for statically allocated data.

The alignment is fine, but when I use two pointers to such a struct, and pass them as destination and source for memcpy() then the compiler decides to use __memcpy_avx_unaligned() to copy.

How can I force clang to use the aligned avx memcpy function instead, which I assume is the faster variant?

OS: Ubuntu 16.04.3 LTS, Clang: 3.8.0-2ubuntu4.

UPDATE
The __memcpy_avx_unaligned() is invoked only when copying two or more structs. When copying just one, clang emits 14 vmovup instructions.

回答1:

__memcpy_avx_unaligned is just an internal glibc function name. It does not mean that there is a faster __memcpy_avx_aligned function. The name is just convey a hint to the glibc developers how this memcpy variant is implemented.

The other question is whether it would be faster for the C compiler to emit an inline expansion of memcpy, using four AVX2 load/store operations. The code for that would be larger than the memcpy call, but it might still be faster overall. It may be possible to help the compiler to do this using the __builtin_assume_aligned builtin.

来源：https://stackoverflow.com/questions/47231791/hint-to-compiler-that-it-can-use-aligned-memcpy

标签

glibc

memcpy

memory-alignment

avx