May compiler optimizations be inhibited by multi-threading?

后端未结

关注

 4  1760

一整个雨季 2020-12-09 09:25

It happened to me a few times to parallelize portion of programs with OpenMP just to notice that in the end, despite the good scalability, most of the foreseen speed-up was

4条回答

野趣味 (楼主)

2020-12-09 10:08

That's a good question, even if it's rather broad, and I'm looking forward to hearing from the experts. I think @JimCownie had a good comment about this at the following discussion Reasons for omp_set_num_threads(1) slower than no openmp

Auto-vectorization and parallelization I think are often a problem. If you turn on Auto-parallelization in MSVC 2012 (auto-vectorization is on my default) they seem not to mix well together. Using OpenMP seems to disable the auto-vectorization of MSVC. The same maybe be true for GCC with OpenMP and auto-vectorization but I'm not sure.

I don't really trust auto-vectorization in the compiler anyway. One reason is that I'm not sure it does it does loop-unrolling to eliminate carried loop dependencies as well as scalar code. For this reason I try and do these things myself. I do the vectorization myself (using Agner Fog's vector class) and I unroll the loops myself. By doing this by hand I feel more confidant that I maximize all the parallelism: TLP (e.g. with OpenMP), ILP (e.g. by removing data dependencies with loop unrolling), and SIMD (with explicit SSE/AVX code).

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...