Is _mm_broadcast_ss faster than _mm_set1_ps?

前端 未结 3 1820
梦毁少年i
梦毁少年i 2020-12-19 06:30

Is this code

float a = ...;
__m256 b = _mm_broadcast_ss(&a)

always faster than this code

float a = ...;
_mm_set1_ps(a)         


        
3条回答
  •  盖世英雄少女心
    2020-12-19 07:22

    If you target AVX instruction set, gcc will use VBROADCASTSS to implement _mm_set1_ps intrinsic. Clang, however, will use two instructions (VMOVSS + VPSHUFD).

提交回复
热议问题