问题
I am trying to test Pi calculation problem with OpenMP. I have this code:
#pragma omp parallel private(i, x, y, myid) shared(n) reduction(+:numIn) num_threads(NUM_THREADS)
{
printf("Thread ID is: %d\n", omp_get_thread_num());
myid = omp_get_thread_num();
printf("Thread myid is: %d\n", myid);
for(i = myid*(n/NUM_THREADS); i < (myid+1)*(n/NUM_THREADS); i++) {
//for(i = 0; i < n; i++) {
x = (double)rand()/RAND_MAX;
y = (double)rand()/RAND_MAX;
if (x*x + y*y <= 1) numIn++;
}
printf("Thread ID is: %d\n", omp_get_thread_num());
}
return 4. * numIn / n;
}
When I compile with gcc -fopenmp pi.c -o hello_pi
and run it time ./hello_pi
for n = 1000000000
I get
real 8m51.595s
user 4m14.004s
sys 60m59.533s
When I run it on with a single thread I get
real 0m20.943s
user 0m20.881s
sys 0m0.000s
Am I missing something? It should be faster with 8 threads. I have 8-core CPU.
回答1:
Please take a look at the http://people.sc.fsu.edu/~jburkardt/c_src/openmp/compute_pi.c This might be a good implementation for pi computing.
It is quite important to know that how your data spread to different threads and how the openmp collect them back. Usually, a bad design (which has data dependencies across threads) running on multiple thread will result in a slower execution than a single thread .
回答2:
rand()
in stdlib.h
is not thread-safe. Using it in multi-thread environment causes a race condition on its hidden state variables, thus lead to poor performance.
http://man7.org/linux/man-pages/man3/rand.3.html
In fact the following code work well as an OpenMP demo.
$ gc -fopenmp -o pi pi.c -O3; time ./pi
pi: 3.141672
real 0m4.957s
user 0m39.417s
sys 0m0.005s
code:
#include <stdio.h>
#include <omp.h>
int main()
{
const int n=50000;
const int NUM_THREADS=8;
int numIn=0;
#pragma omp parallel for reduction(+:numIn) num_threads(NUM_THREADS)
for(int i = 0; i < n; i++) {
double x = (double)i/n;
for(int j=0;j<n; j++) {
double y = (double)j/n;
if (x*x + y*y <= 1) numIn++;
}
}
printf("pi: %f\n",4.*numIn/n/n);
return 0;
}
回答3:
In general I would not compare times without optimization on. Compile with something like
gcc -O3 -Wall -pedantic -fopenmp main.c
The rand()
function is not thread safe in Linux (but it's fine with MSVC and I guess mingw32 which uses the same C run-time libraries, MSVCRT, as MSVC). You can use rand_r
with a different seed for each thread. See openmp-program-is-slower-than-sequential-one.
In general try to avoid defining the chunk sizes when you parallelize a loop. Just use #pragma omp for schedule(shared)
. You also don't need to specify that the loop variable in a parallelized loop is private (the variable i
in your code).
Try the following code
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
int i, numIn, n;
unsigned int seed;
double x, y, pi;
n = 1000000;
numIn = 0;
#pragma omp parallel private(seed, x, y) reduction(+:numIn)
{
seed = 25234 + 17 * omp_get_thread_num();
#pragma omp for
for (i = 0; i <= n; i++) {
x = (double)rand_r(&seed) / RAND_MAX;
y = (double)rand_r(&seed) / RAND_MAX;
if (x*x + y*y <= 1) numIn++;
}
}
pi = 4.*numIn / n;
printf("asdf pi %f\n", pi);
return 0;
}
You can find a working example of this code here http://coliru.stacked-crooked.com/a/9adf1e856fc2b60d
来源:https://stackoverflow.com/questions/19489806/why-does-calculation-with-openmp-take-100x-more-time-than-with-a-single-thread