Why OpenMP under ubuntu 12.04 is slower than serial version

前端未结

关注

 3  1164

别跟我提以往

I\'ve read some other questions on this topic. However, they didn\'t solve my problem anyway.

I wrote the code as following and I got pthread version an

相关标签:

3条回答

天命终不由人

2020-12-03 16:58
There is nothing wrong with OpenMP in your case. What is wrong is the way you measure the elapsed time.

Using clock() to measure the performance of multithreaded applications on Linux (and most other Unix-like OSes) is a mistake since it does not return the wall-clock (real) time but instead the accumulated CPU time for all process threads (and on some Unix flavours even the accumulated CPU time for all child processes). Your parallel code shows better performance on Windows since there clock() returns the real time and not the accumulated CPU time.

The best way to prevent such discrepancies is to use the portable OpenMP timer routine omp_get_wtime():
```
double start = omp_get_wtime();
#pragma omp parallel for
for(int n = 0; n < sizen; ++n)
    sinTable[n] = std::sin(2 * M_PI * n / sizen);
double finish = omp_get_wtime();
printf("from omp: %lf\n", finish - start);
```
For non-OpenMP applications, you should use clock_gettime() with the CLOCK_REALTIME clock:
```
struct timespec start, finish;
clock_gettime(CLOCK_REALTIME, &start);
#pragma omp parallel for
for(int n = 0; n < sizen; ++n)
    sinTable[n] = std::sin(2 * M_PI * n / sizen);
clock_gettime(CLOCK_REALTIME, &finish);
printf("from omp: %lf\n", (finish.tv_sec + 1.e-9 * finish.tv_nsec) -
                          (start.tv_sec + 1.e-9 * start.tv_nsec));
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2020-12-03 16:58

The Linux scheduler, in the absence of any information, will tend to schedule threads in a process on the same core so that they are served by the same cache and memory bus. It has no way of knowing that your threads will be accessing different memory so won't be hurt instead of helped by being on different cores.

Use the sched_setaffinity function to set each thread to a different core mask.

0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2020-12-03 17:07
WARNING: tho answer is controversial. The trick described below is implementation dependent and can lead to a decrease of performance. Still, it might increase it as well. I strongly recommend to take a look at comments to this answer.

This doesn't really answer the question, but if you alter the way you parallelize your code, you might get a performance boost. Now you do it like this:
```
#pragma omp parallel for
for(int n = 0; n < sizen; ++n)
    sinTable[n] = std::sin(2 * M_PI * n / sizen);
```
In this case each thread will compute one item. Since you have 2 cores, OpenMP will create two threads by default. To calculate each value a thread would have to:
1. Initialize.
2. Compute values.
The first step is rather expensive. And both your threads would have to do it sizen/2 times. Try to do the following:
```
    int workloadPerThread = sizen / NUM_THREADS;
    #pragma omp parallel for
    for (int thread = 0; thread < NUM_THREADS; ++thread)
    {
        int start = thread * workloadPerThread;
        int stop = start + workloadPerThread;
        if (thread == NUM_THREADS - 1)
                stop += sizen % NUM_THREADS;
        for (int n = start; n < stop; ++n)
            sinTable[n] = std::sin(2 * M_PI * n / sizen);
    }
```
This way your threads will initialize only once.
0 讨论(0)
发布评论:

提交评论
- 加载中...