I have some code that operates on 4D vectors and I\'m currently trying to convert it to use SSE. I\'m using both clang and gcc on 64b linux.
Operating only on vectors is
There is no reason one should have to use intrinsics for this. The OP just wants to do a broadcast. That's as basic a SIMD operation as SIMD addition. Any decent SIMD library/extension has to support broadcasts. Agner Fog's vector class certainly does, OpenCL does, the GCC documention clearly shows that it does.
a = b + 1; /* a = b + {1,1,1,1}; */
a = 2 * b; /* a = {2,2,2,2} * b; */
The following code compiles just fine
#include
int main() {
typedef float float4 __attribute__ ((vector_size (16)));
float4 x = {1,2,3,4};
float4 y = (25.0f/216.0f)*x;
printf("%f %f %f %f\n", y[0], y[1], y[2], y[3]);
//0.115741 0.231481 0.347222 0.462963
}
You can see the results at http://coliru.stacked-crooked.com/a/de79cca2fb5d4b11
Compare that code to the intrinsic code and it's clear which one is more readable. Not only is it more readable it's easier to port to e.g. ARM Neon. It also looks very similar to OpenCL C code.