DWORD WINAPI MyThreadFunction(LPVOID lpParam) {
volatile auto x = 1;
for (auto i = 0; i < 800000000 / MAX_THREADS;
I did some tests a while ago on Windows, (Vista 64 Ultimate), on a 4/8 core i7. I used similar 'counting' code, submitted as tasks to a threadpool with varying numbers of threads, but always with the same total amount of work. The threads in the pool were given a low priority so that all the tasks got queued up before the threads, and timing, started. Obviously, the box was otherwise idle, (~1% CPU used up on services etc).
8 tests,
400 tasks,
counting to 10000000,
using 8 threads:
Ticks: 2199
Ticks: 2184
Ticks: 2215
Ticks: 2153
Ticks: 2200
Ticks: 2215
Ticks: 2200
Ticks: 2230
Average: 2199 ms
8 tests,
400 tasks,
counting to 10000000,
using 32 threads:
Ticks: 2137
Ticks: 2121
Ticks: 2153
Ticks: 2138
Ticks: 2137
Ticks: 2121
Ticks: 2153
Ticks: 2137
Average: 2137 ms
8 tests,
400 tasks,
counting to 10000000,
using 128 threads:
Ticks: 2168
Ticks: 2106
Ticks: 2184
Ticks: 2106
Ticks: 2137
Ticks: 2122
Ticks: 2106
Ticks: 2137
Average: 2133 ms
8 tests,
400 tasks,
counting to 10000000,
using 400 threads:
Ticks: 2137
Ticks: 2153
Ticks: 2059
Ticks: 2153
Ticks: 2168
Ticks: 2122
Ticks: 2168
Ticks: 2138
Average: 2137 ms
With tasks that take a long time, and with very little cache to swap out on a context-change, the number of threads used makes hardly any difference to the overall run time.