问题
I read in several OpenMP tutorials that you should not generate more tasks than there are threads. For example: "Do not start more tasks than there are available threads, which means available in the enclosing parallel region."
Assume that we want to traverse a binary tree, that the subtrees of a node can be traversed in parallel, and that our machine has four cores. Following the advice above, we generate two tasks at the root, one for the left and one for the right subtree. Within both tasks, we generate two nested tasks, again one for each subtree. Now, we have four tasks, so we do not split them further.
However, if the four subtrees are not of the same size, some cores will have to wait. Would it not be better for load balancing to continue splitting somewhat further and generate, say, 16 tasks? Even if we have only four cores?
Is it generally a good advice to generate not more tasks then there are threads or is this nonsense?
回答1:
You can use more tasks than threads, that is the main purpose of tasks. And yes, using more tasks than threads will help will load balance.
The referenced article seems rather critical to me and while there is some merit to his criticism, I would not agree with the negativity. I am not going to go into each and every point he makes against OpenMP tasks. Yes, there are ways to shoot yourself in the foot with OpenMP tasks, as there is with any parallelism paradigm or C/C++/Fortran in general.
One thing I would worry about when implementing tree algorithms with tasks is task creation overhead. While tasks are more lightweight than threads, they are not free. If you have only a few instructions your leaf nodes in the tree, and you create a task for everyone of them, you will have a huge overhead.
You can use omp task if (depth < theshold)
for this, but this still leaves a little bit of overhead in the deeper layers. To totally avoid this, you have to implement two versions of the traversal function, a serial one which always calls the serial one and a task one which conditionally calls either the task or the serial one. That may also provide better optimization. You should chose the threshold such that minimum task duration >> task overhead
and task count is sufficient to avoid load imbalances (depends on variation).
来源:https://stackoverflow.com/questions/52234246/generating-more-tasks-than-there-are-threads