问题
I wrote a small C program to assess OpenMP's capability to yield to another task when idle time in a task occurs (e.g. wait for communicated data):
#include <stdio.h>
#include <sys/time.h>
#include <omp.h>
#define NTASKS 10
double wallClockTime(void) {
struct timeval t;
gettimeofday(&t, NULL);
return (double)(t.tv_sec + t.tv_usec/1000000.);
}
void printStatus(char *status, int taskNum, int threadNum) {
#pragma omp critical(printStatus)
{
int i;
for (i = 0; i < taskNum; i++) printf(" ");
printf(" %s%i \n", status, threadNum);
}
}
void task(int taskNum) {
// "r"un task
printStatus("r", taskNum, omp_get_thread_num());
sleep(1);
// "s"leeping task that can yield
printStatus("s", taskNum, omp_get_thread_num());
double idleStartTime = wallClockTime();
while (wallClockTime() < idleStartTime + 1) {
#pragma omp taskyield
}
// "c"ontinue task
printStatus("c", taskNum, omp_get_thread_num());
sleep(1);
}
int main(int argc, char* argv[]) {
#pragma omp parallel
#pragma omp single nowait
{
int i;
printf("thread %d is master\n\n", omp_get_thread_num());
for (i = 0; i < NTASKS; i++) printf(" %02d ", i);
printf("\n");
for (i = 0; i < NTASKS; i++) {
#pragma omp task untied
task(i);
}
}
return 0;
}
I used Intel C compiler 17.0.4. Here is the output from a run with 3 threads:
thread 0 is master
00 01 02 03 04 05 06 07 08 09
r1
r0
r2
s1
s0
s2
r0
c1
c2
s0
r0
r1
r2
s0
r0
s1
s2
s0
r0
c1
c2
s0
r0
s0
c0
c0
c0
c0
c0
c0
Thread 1 and 2 do not yield at all, but they stick to their assigned task instead. I would also expect threads 1 and 2 to continue on the suspended untied tasks 04 ... 09, but these are only handled by master thread 0 while the other threads are idle.
Do the tasks have to be issued or yielded in a different way, or is Intel's OpenMP runtime not (yet) capable to handle this? Btw., GNU gcc 4.9.2 does not yield from tasks at all.
回答1:
I think your code is just fine and this is an implementation issue. In fact in the LLVM OpenMP implementation - which is very much related to Intel's - pushed a commit two weeks ago that fixes your issue. In my tests, clang's current libiomp5.so
(built from trunk) was compatible with icc 17.0.4
just by setting LD_LIBRARY_PATH
and produces the desired result.
thread 0 is master
00 01 02 03 04 05 06 07 08 09
r0
r2
r1
s0
r0
s2
r2
s1
r1
s0
r0
s2
r2
s1
r1
s0
r0
s2
s1
c2
s0
c1
c0
c2
c1
c0
c2
c1
c0
c0
I can also confirm that gcc does not yield at all, but haven't looked in detail.
I have no idea if and when the change might be merged into library shipped by Intel.
Update: You are right that the behavior is still not optimal. From briefly looking through the code it seems that libiomp
supports the notion of tied tasks, but does not requeue the task during a taskwait
but instead just executes another task and retains the context of the suspended task on the stack. I suspect a proper support would require more heavy compiler support (continuations of a sort) rather than just generating library calls.
Again, you are doing everything right, but the compiler / runtime is not sophisticated enough to support what the standard allows (the behavior is fully standards compliant). Also note that for the described current behavior of libiomp
, the tasks don't even need to be untied, since they are only queue so far. There doesn't seem to be an easy way to get what you want short of splitting up / chaining tasks.
来源:https://stackoverflow.com/questions/47658571/how-to-yield-resume-openmp-untied-tasks-correctly