Optimizing Numeric Program with SIMD
问题 I am try to optimizing the performance of the following naive program without changing the algorithm : naive (int n, const int *a, const int *b, int *c) //a,b are two array with given size n; { for (int k = 0; k < n; k++) for (int i = 0; i < n - k; ++i) c[k] += a[i + k] * b[i]; } My idea is as follows : First, I use OpenMP for the outer loop. For the inner loop, as it is imbalanced, I specify n-k to determine whether to use AXV2 SIMD intrinsic or simply reduce . And finally, I find that it