May compiler optimizations be inhibited by multi-threading?

后端 未结 4 1760
一整个雨季
一整个雨季 2020-12-09 09:25

It happened to me a few times to parallelize portion of programs with OpenMP just to notice that in the end, despite the good scalability, most of the foreseen speed-up was

4条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-09 10:08

    That's a good question, even if it's rather broad, and I'm looking forward to hearing from the experts. I think @JimCownie had a good comment about this at the following discussion Reasons for omp_set_num_threads(1) slower than no openmp

    Auto-vectorization and parallelization I think are often a problem. If you turn on Auto-parallelization in MSVC 2012 (auto-vectorization is on my default) they seem not to mix well together. Using OpenMP seems to disable the auto-vectorization of MSVC. The same maybe be true for GCC with OpenMP and auto-vectorization but I'm not sure.

    I don't really trust auto-vectorization in the compiler anyway. One reason is that I'm not sure it does it does loop-unrolling to eliminate carried loop dependencies as well as scalar code. For this reason I try and do these things myself. I do the vectorization myself (using Agner Fog's vector class) and I unroll the loops myself. By doing this by hand I feel more confidant that I maximize all the parallelism: TLP (e.g. with OpenMP), ILP (e.g. by removing data dependencies with loop unrolling), and SIMD (with explicit SSE/AVX code).

提交回复
热议问题