Using __m256d registers

后端 未结 1 1970
耶瑟儿~
耶瑟儿~ 2021-02-08 01:16

How do you use __m256d?

Say I want to use the Intel AVX instruction _mm256_add_pd on a simple Vector3 class with 3-64 bit double p

1条回答
  •  轮回少年
    2021-02-08 01:56

    First I'd like to clear up a little confusion. __m256d isn't a type of register, it's a data type that can be loaded into an AVX register. A __m256d is no more a register than an int is a register. There are a few ways to get data in and out of an __m256d (or any other vector type):

    Using a union: Yes, the union trick works. It works very well, since the union will generally have the correct alignment (although malloc might not, use posix_memalign or _aligned_malloc).

    class Vector3 {
    public:
        Vector3(double xx, double yy, double zz);
        Vector3(__m256d vvec);
    
    
        Vector3 operator+(const Vector3 &other) const
        {
            return Vector3(_mm256_add_pd(vec, other.vec));
        }
    
        union {
            struct {
                double x, y, z;
            };
            __m256d vec; // a data field, maybe a register, maybe not
        };
    };
    

    Using intrinsics: Inside a function, it's usually easier to use intrinsics to get data in and out of a vector type.

    __m256d vec = ...;
    double x, y, z;
    vec = _mm256_add_pd(vec, _mm256_set_pd(x, y, z, 0.0));
    

    Using pointer casts: Casting pointers is the last resort for a couple of reasons.

    1. The pointer might not be aligned correctly.

    2. Casting pointers can sometimes mess with the compiler's aliasing analysis.

    3. Pointer casting bypasses a number of safety guarantees.

    So I'd only use pointer casting to plow through a big array of data.

    0 讨论(0)
提交回复
热议问题