SSE (SIMD): multiply vector by scalar

£可爱£侵袭症+ 提交于 2019-12-03 15:16:48

问题


A common operation I do in my program is scaling vectors by a scalar (V*s, e.g. [1,2,3,4]*2 == [2,4,6,8]). Is there a SSE (or AVX) instruction to do this, other than first loading the scalar in every position in a vector (e.g. _mm_set_ps(2,2,2,2)) and then multiplying?

This is what I do now:

__m128 _scalar = _mm_set_ps(s,s,s,s);
__m128 _result = _mm_mul_ps(_vector, _scalar);

I'm looking for something like...

__m128 _result = _mm_scale_ps(_vector, s);

回答1:


Depending on your compiler you may be able to improve the code generation a little by using _mm_set1_ps:

const __m128 scalar = _mm_set1_ps(s);
__m128 result = _mm_mul_ps(vector, scalar);

However scalar constants like this should only need to be initialised once, outside any loops, so the performance cost should be irrelevant. (Unless the scalar value is changing within the loop ?)

As always you should look at the code your compiler generates and also try running your code under a decent profiler to see where the hotspots really are.




回答2:


There is no instruction for multiplication of a vector by a scalar. There, however, some instructions for loading the same scalar values into all positions in a vector register.

AVX instruction set provides _mm_broadcast_ss/_mm256_broadcast_ss/_mm256_broadcast_sd intrinsics for populating SSE and AVX registers with the same float/double value.

In SSE3 instruction set you may find _mm_loaddup_pd intrinsic which populates SSE register with the same double value.

In other versions of SSE typically the best option is to load a scalar value using _mm_load_ss/_mm_load_sd and then copy it to all elements of a vector register with _mm_shuffle_ps/_mm_unpacklo_pd.




回答3:


I don't know of any single instruction that does what you want. Is the set operation truly a bottleneck? If you're multiplying a large vector by the same constant, the time it takes to fill an XMM/YMM register with four copies of the constant should be a very small fraction of the overall time taken.

As a simple optimization, if the constant is 2 as it was in your example, you could replace the multiply with an add instruction instead, not requiring any constant.



来源:https://stackoverflow.com/questions/9079580/sse-simd-multiply-vector-by-scalar

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!