accessing __m128 fields across compilers

我的未来我决定 提交于 2019-12-06 03:56:04

To load a __m128, you can write _mm_setr_ps(1.f, 2.f, 3.f, 4.f), which is supported by GCC, ICC, MSVC and clang.

So far as I know, clang and recent versions of GCC support accessing __m128 fields by index. I don't know how to do this in ICC or MSVC. I guess _mm_extract_ps works for all 4 compilers but its return type is insane making it painful to use.

Z boson

If you want you code to work on other compilers then don't use those GCC extensions. Use the set/load/store intrinsics. _mm_setr_ps is fine for setting constant values but should not be used in a loop. To access elements I normally store the values to an array first then read the array.

If you have an array a you should read/store it in with

__m128 t = _mm_loadu_ps(a);
_mm_storeu_ps(a, t);

If the array is 16-byte aligned you can use an aligned load/store which is slightly faster on newer systems but much faster on older systems.

__m128 t = _mm_load_ps(a);
_mm_store_ps(a, t);

To get 16-byte aligned memory on the stack use

__declspec(align(16)) const float a[] = ...//MSVC
__attribute__((aligned(16))) const float a[] ...//GCC, ICC

For 16-byte aligned dynamic arrays use:

float *a = (float*)_mm_malloc(sizeof(float)*n, 16); //MSVC, GCC, ICC, MinGW 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!