OpenMP - Why does the number of comparisons decrease?

你。 提交于 2019-12-31 04:07:09

问题


I have the following algorithm:

int hostMatch(long *comparisons)
{
    int i = -1;
    int lastI = textLength-patternLength;
    *comparisons=0;

    #pragma omp parallel for schedule(static, 1) num_threads(1)
    for (int k = 0; k <= lastI; k++)
    {
        int j;
        for (j = 0; j < patternLength; j++)
        {
            (*comparisons)++;
            if (textData[k+j] != patternData[j])
            {
                j = patternLength+1; //break    
            }
        }
        if (j == patternLength && k > i)
            i = k;
    }

    return i;
}

When changing num_threads I get the following results for number of comparisons:

  • 01 = 9949051000
  • 02 = 4992868032
  • 04 = 2504446034
  • 08 = 1268943748
  • 16 = 776868269
  • 32 = 449834474
  • 64 = 258963324

Why is the number of comparisons not constant? It's interesting because the number of comparisons halves with the doubling of the number of threads. Is there some sort of race conditions going on for (*comparisons)++ where OMP just skips the increment if the variable is in use?

My current understanding is that the iterations of the k loop are split near-evenly amongst the threads. Each iteration has a private integer j as well as a private copy of integer k, and a non-parallel for loop which adds to the comparisons until terminated.


回答1:


You said it yourself (*comparisons)++; has a race condition. It is a critical section that has to be serialized (I don't think (*pointer)++ is an atomic operation).

So basically you read the same value( i.e. 2) twice by two threads and then both increase it (3) and write it back. So you get 3 instead of 4. You have to make sure the operations on variables, that are not in the local scope of your parallelized function/loop, don't overlap.




回答2:


The naive way around the race condition to declare the operation as atomic update:

#pragma omp atomic update
(*comparisons)++;

Note that a critical section here is unnecessary and much more expensive. An atomic update can be declared on a primitive binary or unary operation on any l-value expression with scalar type.

Yet this is still not optimal because the value of *comparisons needs to be moved around between CPU caches all the time and a expensive locked instruction is performed. Instead you should use a reduction. For that you need another local variable, the pointer won't work here.

int hostMatch(long *comparisons)
{
    int i = -1;
    int lastI = textLength-patternLength;
    long comparisons_tmp = 0;

    #pragma omp parallel for reduction(comparisons_tmp:+)
    for (int k = 0; k <= lastI; k++)
    {
        int j;
        for (j = 0; j < patternLength; j++)
        {
            comparisons_tmp++;
            if (textData[k+j] != patternData[j])
            {
                j = patternLength+1; //break    
            }
        }
        if (j == patternLength && k > i)
            i = k;
    }

    *comparisons = comparisons_tmp;

    return i;
}

P.S. schedule(static, 1) seems like a bad idea, since this will lead to inefficient memory access patterns on textData. Just leave it out and let the compiler do it's thing. If a measurement shows that it's not working efficiently, give it some better hints.



来源:https://stackoverflow.com/questions/42395568/openmp-why-does-the-number-of-comparisons-decrease

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!