Why are 50 threads faster than 4?
DWORD WINAPI MyThreadFunction(LPVOID lpParam) { volatile auto x = 1; for (auto i = 0; i < 800000000 / MAX_THREADS; ++i) { x += i / 3; } return 0; } This function is run in MAX_THREADS threads. I have run the tests on Intel Core 2 Duo , Windows 7 , MS Visual Studio 2012 using Concurrency Visualizer with MAX_THREADS=4 and MAX_THREADS=50 . test1 (4 threads) completed in 7.1 seconds , but test2 (50 threads) completed in 5.8 seconds while test1 has more context switches than test2 . I have run the same tests on Intel Core i5 , Mac OS 10.7.5 and got the same results. I decided to benchmark this