reduction with OpenMP with SSE/AVX

后端 未结 1 1982
温柔的废话
温柔的废话 2020-12-16 06:57

I want to do a reduction on an array using OpenMP and SIMD. I read that a reduction in OpenMP is equivalent to:

inline float sum_scalar_openmp2(const float          


        
1条回答
  •  离开以前
    2020-12-16 07:24

    I guess the answer to your question is No. I don't think there is a better way of doing reduction with more complicated operators in OpenMP.

    Assuming that the array is 16 bit aligned, number of openmp threads is 4, one might expect the performance gain to be 12x - 16x by OpenMP + SIMD. In realistic, it might not produce enough performance gain because

    1. There is a overhead in creating the openmp threads.
    2. The code is doing 1 addition operation for 1 Load operation. Hence, the CPU isn't doing enough computation. So, it almost looks like the CPU spends most of the time in loading the data, kind of memory bandwidth bound.

    0 讨论(0)
提交回复
热议问题