Poor performance due to hyper-threading with OpenMP: how to bind threads to cores
问题 I am developing large dense matrix multiplication code. When I profile the code it sometimes gets about 75% of the peak flops of my four core system and other times gets about 36%. The efficiency does not change between executions of the code. It either starts at 75% and continues with that efficiency or starts at 36% and continues with that efficiency. I have traced the problem down to hyper-threading and the fact that I set the number of threads to four instead of the default eight. When I