Is _mm_broadcast_ss faster than _mm_set1_ps?

前端 未结 3 1785
梦毁少年i
梦毁少年i 2020-12-19 06:30

Is this code

float a = ...;
__m256 b = _mm_broadcast_ss(&a)

always faster than this code

float a = ...;
_mm_set1_ps(a)         


        
3条回答
  •  自闭症患者
    2020-12-19 07:21

    mm_broadcast_ss is likely to be faster than mm_set1_ps. The former translates into a single instruction (VBROADCASTSS), while the latter is emulated using multiple instructions (probably a MOVSS followed by a shuffle). However, mm_broadcast_ss requires the AVX instruction set, while only SSE is required for mm_set1_ps.

提交回复
热议问题