I believe everyone agree with the title of this post. Can someone point me the reason ? Any reference to that like book etc ? I have tried to find but no luck.
I believe
As Mystical explained, it's likely due to the OpenMP overhead. I have tried to get around this by doing for example:
#pragma omp parallel for if(nthreads>1)
I thought this would only use the OpenMP overhead if nthreads>1. However, at least in Visual Studio 2012, this also has significant overhead. Therefore, in order to properly compare single threaded and multi-threaded version of a function I define two versions of the functions with and without the OpenMP pragmas.