Existence of “simd reduction(:)” In GCC and MSVC?

╄→гoц情女王★ 提交于 2019-12-04 12:41:07

GCC definitely can vectorize. Suppose you have file reduc.c with contents:

int foo(int *x, int N)
  {
    int acc, i;

    for( i = 0; i < N; ++i )
      {
        acc += x[i];
      }

    return acc;
  }

Compile it (I used gcc 4.7.2) with command line:

$ gcc -O3 -S reduc.c -ftree-vectorize -msse2

Now you can see vectorized loop in assembler.

Also you may switch on verbose vectorizer output say with

$ gcc -O3 -S reduc.c -ftree-vectorize -msse2 -ftree-vectorizer-verbose=1

Now you will get console report:

Analyzing loop at reduc.c:5
Vectorizing loop at reduc.c:5
5: LOOP VECTORIZED.
reduc.c:1: note: vectorized 1 loops in function.

Look at the official docs to better understand cases where GCC can and cannot vectorize.

For Visual Studio 2012: With options /O1 /O2/GL, to report vectorization use /Qvec-report:(1/2)

int s = 0; 
for ( int i = 0; i < 1000; ++i ) 
{ 
s += A[i]; // vectorizable 
}

In the case of reductions over "float" or "double" types, vectorization requires that the /fp:fast switch is thrown. This is because vectorizing the reduction operation depends upon "floating point reassociation". Reassociation is only allowed when /fp:fast is thrown

Ref(associated doc;p12) http://blogs.msdn.com/b/nativeconcurrency/archive/2012/07/10/auto-vectorizer-in-visual-studio-11-cookbook.aspx

gcc requires -ffast-math to enable this optimization (as mentioned in the reference given above), regardless of use of #pragma omp simd reduction. icc is becoming less reliant on pragma for this optimization (except that /fp:fast is needed in absence of pragma), but the extra ivdep and simd pragmas in the original post are undesirable. icc may do bad things when given a pragma simd which doesn't include all relevant reduction, firstprivate, and lastprivate clauses (and gcc may break with -ffast-math, particularly in combination with -march or -mavx). msvc 2012/2013 are very limited in auto-vectorization. There are no simd reductions, no vectorization within OpenMP parallel regions, no vectorization of conditionals, and no advantage is taken of __restrict in vectorizations (there is some run-time check to vectorize less efficiently but safely without __restrict).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!