__m256d TRANSPOSE4 Equivalent?
Intel has included __MM_TRANPOSE4_PS to transpose a 4x4 matrix of vectors. I'm wanting to do the equivalent with __m256d. However, I can't seem to figure out how to get _mm256_shuffle_pd in the same manner. _MM_TRANSPOSE4_PS Code #define _MM_TRANSPOSE4_PS(row0, row1, row2, row3) { \ __m128 tmp3, tmp2, tmp1, tmp0; \ \ tmp0 = _mm_shuffle_ps((row0), (row1), 0x44); \ tmp2 = _mm_shuffle_ps((row0), (row1), 0xEE); \ tmp1 = _mm_shuffle_ps((row2), (row3), 0x44); \ tmp3 = _mm_shuffle_ps((row2), (row3), 0xEE); \ \ (row0) = _mm_shuffle_ps(tmp0, tmp1, 0x88); \ (row1) = _mm_shuffle_ps(tmp0, tmp1, 0xDD); \