May compiler optimizations be inhibited by multi-threading?
问题 It happened to me a few times to parallelize portion of programs with OpenMP just to notice that in the end, despite the good scalability, most of the foreseen speed-up was lost due to the poor performance of the single threaded case (if compared to the serial version). The usual explanation that appears on the web for this behavior is that the code generated by compilers may be worse in the multi-threaded case . Anyhow I am not able to find anywhere a reference that explains why the assembly