I have a __m256d vector packed with four 64-bit floating-point values.
I need to find the horizontal maximum of the vector\'s elements and store the result in a double-p
The general way of doing this for a vector v1 = [A, B, C, D]
is
v1
to v2 = [C, D, A, B]
(swap 0th and 2nd elements, and 1st and 3rd ones)v3 = max(v1,v2)
. You now have [max(A,C), max(B,D), max(A,C), max(B,D)]
v3
to v4
, swapping the 0th and 1st elements, and the 2nd and 3rd ones.v5 = max(v3,v4)
. Now v5
contains the horizontal max in all of its components.Specifically for AVX, the permutations can be done with _mm256_permute_pd
and the maximums can be done with _mm256_max_pd
. I don't have the exact permute masks handy but they should be pretty straightforward to figure out.
Hope that helps.