I want to do a reduction on an array using OpenMP and SIMD. I read that a reduction in OpenMP is equivalent to:
inline float sum_scalar_openmp2(const float
I guess the answer to your question is No. I don't think there is a better way of doing reduction with more complicated operators in OpenMP.
Assuming that the array is 16 bit aligned, number of openmp threads is 4, one might expect the performance gain to be 12x - 16x by OpenMP + SIMD. In realistic, it might not produce enough performance gain because