问题
Without using Open MP Directives - serial execution - check screenshot here
Using OpenMp Directives - parallel execution - check screenshot here
#include "stdafx.h"
#include <omp.h>
#include <iostream>
#include <time.h>
using namespace std;
static long num_steps = 100000;
double step;
double pi;
int main()
{
clock_t tStart = clock();
int i;
double x, sum = 0.0;
step = 1.0 / (double)num_steps;
#pragma omp parallel for shared(sum)
for (i = 0; i < num_steps; i++)
{
x = (i + 0.5)*step;
#pragma omp critical
{
sum += 4.0 / (1.0 + x * x);
}
}
pi = step * sum;
cout << pi <<"\n";
printf("Time taken: %.5fs\n", (double)(clock() - tStart) / CLOCKS_PER_SEC);
getchar();
return 0;
}
I have tried multiple times, the serial execution is always faster why?
Serial Execution Time: 0.0200s Parallel Execution Time: 0.02500s
why is serial execution faster here? am I calculation the execution time in the right way?
回答1:
OpenMP internally implement multithreading for parallel processing and multi threading's performance can be measured with large volume of data. With very small volume of data you cannot measure the performance of multithreaded application. The reasons:-
a) To create a thread O/S need to allocate memory to each thread which take time (even though it is tiny bit.)
b) When you create multi threads it needs context switching which also take time.
c) Need to release memory allocated to threads which also take time.
d) It depends on number of processors and total memory (RAM) in your machine
So when you try with small operation with multi threads it's performance will be as same as a single thread (O/S by default assign one thread to every process which is call main thread). So your outcome is perfect in this case. To measure the performance of multithread architecture use large amount of data with complex operation then only you can see the differences.
回答2:
Because of your critical
block you cannot sum sum
in parallel. Everytime one thread reaches the critical
section all other threads have to wait.
The smart approach would be to create a temporary copy of sum for each thread that can be summed without synchronization and afterwards to sum the results from the different threads.
Openmp can do this automatically for with the reduction
clause. So your loop will be changed to.
#pragma omp parallel for reduction(+:sum)
for (i = 0; i < num_steps; i++)
{
x = (i + 0.5)*step;
sum += 4.0 / (1.0 + x * x);
}
On my machine this performs 10 times faster than the version using the critical
block (I also increased num_steps to reduce the influence of one-time actions like thread-creation).
PS: I recommend you you to use <chrono>
, <boost/timer/timer.hpp>
or google benchmark
for timing your code.
来源:https://stackoverflow.com/questions/49948807/parallel-exection-using-openmp-takes-longer-than-serial-execution-c-am-i-calc