问题
I'd like to optimize the following snippet using SSE instructions if possible:
/*
* the data structure
*/
typedef struct v3d v3d;
struct v3d {
double x;
double y;
double z;
} tmp = { 1.0, 2.0, 3.0 };
/*
* the part that should be "optimized"
*/
tmp.x /= 4.0;
tmp.y /= 4.0;
tmp.z /= 4.0;
Is this possible at all?
回答1:
I've used SIMD extension under windows, but have not yet under linux. That being said you should be able to take advantage of the DIVPS
SSE operation which will divide a 4 float vector by another 4 float vector. But you are using doubles, so you'll want the SSE2 version DIVPD
. I almost forgot, make sure to build with -msse2
switch.
I found a page which details some SSE GCC builtins. It looks kind of old, but should be a good start.
http://ds9a.nl/gcc-simd/
回答2:
Is tmp.x *= 0.25;
enough?
Note that for SSE instructions (in case that you want to use them) it's important that:
1) all the memory access is 16 bytes alighed
2) the operations are performed in a loop
3) no int <-> float or float <-> double conversions are performed
4) avoid divisions if possible
回答3:
The intrinsic you are looking for is _mm_div_pd
. Here is a working example which should be enough to steer you in the right direction:
#include <stdio.h>
#include <emmintrin.h>
typedef struct
{
double x;
double y;
double z;
} v3d;
typedef union __attribute__ ((aligned(16)))
{
v3d a;
__m128d v[2];
} u3d;
int main(void)
{
const __m128d vd = _mm_set1_pd(4.0);
u3d u = { { 1.0, 2.0, 3.0 } };
printf("v (before) = { %g %g %g }\n", u.a.x, u.a.y, u.a.z);
u.v[0] = _mm_div_pd(u.v[0], vd);
u.v[1] = _mm_div_pd(u.v[1], vd);
printf("v (after) = { %g %g %g }\n", u.a.x, u.a.y, u.a.z);
return 0;
}
来源:https://stackoverflow.com/questions/3826415/simd-sse-instruction-for-division-in-gcc