If I have an AVX register with 4 doubles in them and I want to store the reverse of this in another register, is it possible to do this with a single intrinsic command?
You actually need 2 permutes to do this:
_mm256_permute2f128_pd() only permutes in 128-bit chunks._mm256_permute_pd() does not permute across 128-bit boundaries.So you need to use both:
inline __m256d reverse(__m256d x){
x = _mm256_permute2f128_pd(x,x,1);
x = _mm256_permute_pd(x,5);
return x;
}
Test:
int main(){
__m256d x = _mm256_set_pd(13,12,11,10);
cout << x.m256d_f64[0] << " " << x.m256d_f64[1] << " " << x.m256d_f64[2] << " " << x.m256d_f64[3] << endl;
x = reverse(x);
cout << x.m256d_f64[0] << " " << x.m256d_f64[1] << " " << x.m256d_f64[2] << " " << x.m256d_f64[3] << endl;
}
Output:
10 11 12 13
13 12 11 10