I made my first approach with vectorization intrinsics with SSE, where there is basically only one data type __m128i
. Switching to Neon I found the data types a
Since the initial proposed method has undefined behaviour in C++, I have implemented something like this:
template
struct NeonVectorType {
private:
T data;
public:
template
operator U () {
BOOST_STATIC_ASSERT_MSG(sizeof(U) == sizeof(T),"Trying to convert to data type of different size");
U u;
memcpy( &u, &data, sizeof u );
return u;
}
template
NeonVectorType& operator =(const U& in) {
BOOST_STATIC_ASSERT_MSG(sizeof(U) == sizeof(T),"Trying to copy from data type of different size");
memcpy( &data, &in, sizeof data );
return *this;
}
};
Then:
typedef NeonVectorType uint_128bit_t; //suitable for uint8x16_t, uint8x8x2_t, uint32x4_t, etc.
typedef NeonVectorType uint_64bit_t; //suitable for uint8x8_t, uint32x2_t, etc.
The use of memcpy is discussed here (and here), and avoids breaking the strict aliasing rule. Note that in general it gets optimized away.
If you look at the edit history, I had implemented a custom version with combine operators for vectors of vectors (e.g. uint8x8x2_t
). The problem was mentioned here. However, since those data types are declared as arrays (see guide, section 12.2.2) and therefore located in consecutive memory locations, the compiler is bound to treat the memcpy
correctly.
Finally, to print the content of the variable one could use a function like this.