How do you use __m256d
?
Say I want to use the Intel AVX instruction _mm256_add_pd on a simple Vector3
class with 3-64 bit double
p
First I'd like to clear up a little confusion. __m256d
isn't a type of register, it's a data type that can be loaded into an AVX register. A __m256d
is no more a register than an int
is a register. There are a few ways to get data in and out of an __m256d
(or any other vector type):
Using a union
: Yes, the union
trick works. It works very well, since the union will generally have the correct alignment (although malloc
might not, use posix_memalign
or _aligned_malloc
).
class Vector3 {
public:
Vector3(double xx, double yy, double zz);
Vector3(__m256d vvec);
Vector3 operator+(const Vector3 &other) const
{
return Vector3(_mm256_add_pd(vec, other.vec));
}
union {
struct {
double x, y, z;
};
__m256d vec; // a data field, maybe a register, maybe not
};
};
Using intrinsics: Inside a function, it's usually easier to use intrinsics to get data in and out of a vector type.
__m256d vec = ...;
double x, y, z;
vec = _mm256_add_pd(vec, _mm256_set_pd(x, y, z, 0.0));
Using pointer casts: Casting pointers is the last resort for a couple of reasons.
The pointer might not be aligned correctly.
Casting pointers can sometimes mess with the compiler's aliasing analysis.
Pointer casting bypasses a number of safety guarantees.
So I'd only use pointer casting to plow through a big array of data.