Intel c++ compiler, ICC, seems to ingnore SSE/AVX seetings

我只是一个虾纸丫 提交于 2019-11-29 11:25:31

Two points:

(1) It appears you are using intel intrinsics in your code -- g++ and icpc do not necessarily implement the same intrinsics (but most of them overlap). Check the header files that need to be imported (g++ may need the hint to define the inartistic for you). Does g++ give an error message when it fails?

(2) The compiler flags do does not mean that instructions will be generated (from icpc --help): -msse3 May generate Intel(R) SSE3, SSE2, and SSE instructions

These flags are usually just hints to the compiler. You may want to look at -xHost and -fast.

It seems no matter what options I try it compiles but does not make optimal use of the AVX code.

How have you checked this? You may not see a 4x speedup if there are other bottlenecks (such as memory bandwidth).

EDIT (based on question edits):

It looks like icc scalar is faster than gcc scalar -- it is possible that icc is vectorizing the scalar code. If this is the case, I would not expect a 4x speedup from icc when manually coding the vectorization.

As far the the difference between icc at 5.782332s and gcc at 3.509130s (for nvec 5000000); this is unexpected. I cannot tell based on the information I have what why there is a difference in the runtime between the two compilers. I would recommend looking at the emitted code (http://www.delorie.com/djgpp/v2faq/faq8_20.html) from both compilers. Also, make sure that your measurements are reproducible (e.g. memory layout on multi-socket machines, hot/cold caches, background processes, etc.).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!