CLANG optimizing using SVML and it's autovectorization
问题 Consider simple function: #include <math.h> void ahoj(float *a) { for (int i=0; i<256; i++) a[i] = sin(a[i]); } Try that at https://godbolt.org/z/ynQKRb, and use following settings -fveclib=SVML -mfpmath=sse -ffast-math -fno-math-errno -O3 -mavx2 -fvectorize Select x86_64 CLANG 7.0, currently the newest. This is the most interesting part of the result: vmovups ymm0, ymmword ptr [rdi] vmovups ymm1, ymmword ptr [rdi + 32] vmovups ymmword ptr [rsp], ymm1 # 32-byte Spill vmovups ymm1, ymmword ptr