Why doesn't gcc resolve _mm256_loadu_pd as single vmovupd?

前端 未结 2 693
独厮守ぢ
独厮守ぢ 2020-11-28 13:51

I\'m writing some AVX code and I need to load from potentially unaligned memory. I\'m currently loading 4 doubles, hence I would use intrinsic instruction

2条回答
  •  执笔经年
    2020-11-28 14:22

    GCC's generic tuning splits unaligned 256-bit loads to help older processors. (Subsequent changes avoid splitting loads in generic tuning, I believe.)

    You can tune for more recent Intel CPUs using something like -mtune=intel or -mtune=skylake, and you will get a single instruction, as intended.

提交回复
热议问题