Why are 50 threads faster than 4?

前端 未结 5 1616
野趣味
野趣味 2020-12-29 06:06
DWORD WINAPI MyThreadFunction(LPVOID lpParam) {
    volatile auto x = 1;
    for (auto i = 0; i < 800000000 / MAX_THREADS;         


        
5条回答
  •  臣服心动
    2020-12-29 06:26

    The problem you encounter is tighly bound to the way you are subdividing the workload of your process. In order to make an efficient use of a multicore system on a multitasking OS, you must ensure that there will always be remaining work for all the cores as long as possible during your process lifetime.

    Consider the situation where your 4 threads process executes on 4 cores, and because of the system load configuration, one of the cores manages to finish 50% faster than the others: for the remaining process time, your CPU will only be able to allocate 3/4 of its processing power to your process, since there's only 3 threads remaining. In the same CPU load scenario, but with many more threads, the workload is split in many more subtasks which can be distributed more finely between the cores, all other things being equal (*).

    This example illustrate that the timing difference is not actually due to the number of threads, but rather to the way the work has been divided, which is much more resilient to an uneven availability of cores in the later case. The same programme built with only 4 threads, but where the work is abstracted to a series of small tasks pulled by threads as soon as they are available would certainly produce similar or even better results on average, even though there would be the overhead of managing the tasks queue.

    The finer granularity of a process task set gives it better flexibility.


    (*) In the situation of a highly loaded system, the many threads approach might not be as beneficial, the unused core being actually allocated to other OS process, hence lightening the load for the three others cores still possibly used by your process.

提交回复
热议问题