问题
I would like to know a way to get the range of values for a given thread in a parallized for loop in OpenMP with C++. For example in the following code I would like to know what the first value each thread uses in the loop for each thread.
#pragma omp parallel for schedule(static)
for(int i=0; i<n; i++)
Let me give you an example of why I might want these values. Let's assume I want to fill an array with the sum of the counting numbers. The closed form solution for the sum of the counting number is n*(n+1)/2
. To do this with OpenMP I could do this:
#pragma omp parallel for schedule(static)
for(int i=0; i<n; i++) {
a[i] = i*(i+1)/2;
}
However, I suspect a faster method to get the sum of the counting numbers is to not use the closed form solution each iteration (which has a square) and instead remember the sum each iteration like this:
int cnt = 0;
for(int i=0; i<n; i++) {
cnt += i;
a[i] = cnt;
}
But the only way to do this with OpenMP I can think of is explictly define the range values like this:
#pragma omp parallel
{
const int ithread = omp_get_thread_num();
const int nthreads = omp_get_num_threads();
const int start = ithread*n/nthreads;
const int finish = (ithread+1)*n/nthreads;
int cnt = 0;
int offset = (start-1)*(start)/2;
for(int i=start; i<finish; i++) {
cnt += i;
a[i] = cnt + offset;
}
}
If I could get the start value from #pragma omp parallel for schedule(static)
then I would not have to define start, finish, ithread, and nthreads
.
Edit: After reading Agner Fog's Optimizing C++ manual I realized that what I am doing is called induction. He gives an example of using induction to more efficiently calculate the values of a polynominal. Here are some examples from his manual
Without induction:
// Example 8.23a. Loop to make table of polynomial
const double A = 1.1, B = 2.2, C = 3.3; // Polynomial coefficients
double Table[100]; // Table
int x; // Loop counter
for (x = 0; x < 100; x++) {
Table[x] = A*x*x + B*x + C; // Calculate polynomial
With induction:
// Example 8.23b. Calculate polynomial with induction variables
const double A = 1.1, B = 2.2, C = 3.3; // Polynomial coefficients
double Table[100]; // Table
int x; // Loop counter
const double A2 = A + A; // = 2*A
double Y = C; // = A*x*x + B*x + C
double Z = A + B; // = Delta Y
for (x = 0; x < 100; x++) {
Table[x] = Y; // Store result
Y += Z; // Update induction variable Y
Z += A2; // Update induction variable Z
}
To do this with OpenMP I need to get the start value for each chunk. The only way to do this with OpenMP is to define the chunks manually.
回答1:
This is an extended comment rather than an answer ...
There is no OpenMP routine or pre-defined variable for getting the range of values for i
(in your case) that each thread will execute. You'll have to write something along the lines that you have outlined to get those numbers yourself.
But before you do, stop and think a bit. All that extra code, and the effort to write and to maintain it, just to avoid one multiplication per iteration ! Even when you get your code working I doubt that any speedup you see will be worth the effort. Worse, as soon as you want to use a different schedule than static
you will have to re-do the index calculations; for many of the other scheduling options the iterations executed by one thread won't be a simple range anyway.
You are programming against the grain, not only of OpenMP, but probably of parallel programming in general. Programs which can be handed out to threads without consideration of the number available at run time or how the run-time system will divide up the work and which do not have dependencies between tasks are ideal for parallelisation. They generally provide good scalability to large numbers of threads without a great deal of programmer effort.
The closed form solution you already have is all you need. Go with the flow. Programming against the grain will (inevitably I would argue) produce more complicated code which is difficult to maintain and which will rarely produce parallel speedups to compensate for their costs.
回答2:
Probably no way to do that. Even if you can get the ranges for each thread, such as start
, where do you expect to inject it to for a single for
loop like this?
#pragma omp parallel for schedule(static)
for(int i=0; i<n; i++) {
a[i] = ...
}
omp parallel for
generally assume there's no dependencies between the iterations. If you add dependencies such as cnt
, you may shouldn't use this directive.
来源:https://stackoverflow.com/questions/19378312/induction-with-openmp-getting-range-values-for-a-parallized-for-loop-in-openmp