In the last couple of years, I\'ve been doing a lot of SIMD programming and most of the time I\'ve been relying on compiler intrinsic functions (such as the ones for SSE pro
Your best bet is probably OpenCL. I know it has mostly been hyped as a way to run code on GPUs, but OpenCL kernels can also be compiled and run on CPUs. OpenCL is basically C with a few restrictions:
and a bunch of additions. In particular vector types:
float4 x = float4(1.0f, 2.0f, 3.0f, 4.0f);
float4 y = float4(10.0f, 10.0f, 10.0f, 10.0f);
float4 z = y + x.s3210 // add the vector y with a swizzle of x that reverses the element order
On big caveat is that the code has to be cleanly sperable, OpenCL can't call out to arbitrary libraries, etc. But if your compute kernels are reasonably independent then you basically get a vector enhanced C where you don't need to use intrinsics.
Here is a quick reference/cheatsheet with all of the extensions.