可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I've gotten stuck writing some parallel c code using OpenMP for a concurrency course.

Heres a snippet

#include <stdio.h> #include <time.h> #include <math.h>  #define FALSE 0 #define TRUE 1  int count_primes_0(int); int count_primes_1(int); int count_primes_2(int);  int main(int argc, char *argv[]){     int n;      if (argc != 2){         printf("Incorrect Invocation, use: \nq1 N");         return 0;     } else {         n = atoi(argv[1]);       }      if (n < 0){         printf("N cannot be negative");         return 0;     }      printf("N = %d\n", n);      //omp_set_num_threads(1);     time_it(count_primes_0, n, "Method 0");     time_it(count_primes_1, n, "Method 1");     time_it(count_primes_2, n, "Method 2");      return 0; }  int is_prime(int n){     for(int i = 2; i <= (int)(sqrt((double) n)); i++){         if ((n % i) == 0){             return FALSE;         }     }      return n > 1; }  void time_it( int (*f)(int), int n, char *string){     clock_t start_clock;     clock_t end_clock;     double calc_time;     int nprimes;      struct timeval start_val;     struct timeval end_val;      start_clock = clock();     nprimes = (*f)(n);     end_clock = clock();     calc_time = ((double)end_clock - (double)start_clock) / CLOCKS_PER_SEC;     printf("\tNumber of primes: %d \t Time taken: %fs\n\n", nprimes, calc_time); }  // METHOD 0 // Base Case no parallelization int count_primes_0(int n){     int nprimes = 0;      for(int i = 1; i <= n; i++){         if (is_prime(i)) {             nprimes++;         }     }      return nprimes; }  //METHOD 1 // Use only For and Critical Constructs int count_primes_1(int n){     int nprimes = 0;      #pragma omp parallel for     for(int i = 1; i <= n; i++){         if (is_prime(i)) {             #pragma omp critical             nprimes++;         }     }      return nprimes; }  //METHOD 2 // Use Reduction int count_primes_2(int n){     int nprimes = 0;      #pragma omp parallel for reduction(+:nprimes)     for(int i = 1; i <= n; i++){         if (is_prime(i)) {            nprimes++;         }     }      return nprimes; }

The problem I'm facing is that when I use omp_set_num_threads() the less threads I use the faster my functions run -- or get closer to the runtime of the base unparallelized case

Time Results: These are run on an 8 core machine

8 Threads: Method 0: 0.07s; Method 1: 1.63s; Method 2: 1.4s

4 Threads: Method 0: 0.07s; Method 1: 0.16s; Method 2: 0.16s

2 Threads: Method 0: 0.07s; Method 1: 0.10; Method 2: 0.09

1 Thread: Method 0: 0.07s; Method 1: 0.08s; Method 2: 0.07s

I've tried disabling optimization and using a different gcc version with no difference

Any help is appreciated.

EDIT: Using clock in Linux returns the 'incorrect' time, wall clock time is what I needed so using ether omp_get_wtime() or the Linux function timeit would produce the proper results.

回答1:

I am surprised that you have seen any success with the program as it is above. If you look at the RedHat Linux man page for clock(), you will see that it "returns an approximation of processor time used by the program". Putting in OpenMP directives causes more overhead, and thus you should see more overall processor time used when you run OpenMP. What you need to look at is elapse time (or wall clock time). When you run in parallel (and you have a program that can benefit from parallel), you will see the elapse time go down. The OpenMP specification defines a routine (omp_get_wtime()) to provide this information.

Changing your program to report using clock() and omp_get_wtime():

$ a.out 1000000 (1,000,000)

2 processors:

clock(): 0.23 wtime(): 0.23 clock(): 0.96 wtime(): 0.16 clock(): 0.59 wtime(): 0.09

4 processors:

clock(): 0.24 wtime(): 0.24 clock(): 0.97 wtime(): 0.16 clock(): 0.57 wtime(): 0.09

8 processors:

clock(): 0.24 wtime(): 0.24 clock(): 2.60 wtime(): 0.26 clock(): 0.64 wtime(): 0.09

$ a.out 10000000 (10,000,000)

2 processors:

clock(): 6.07 wtime(): 6.07 clock(): 10.4 wtime(): 1.78 clock(): 11.3 wtime(): 1.65

4 processors:

clock(): 6.07 wtime(): 6.07 clock(): 11.5 wtime(): 1.71 clock(): 10.7 wtime(): 1.72

8 processors:

clock(): 6.07 wtime(): 6.07 clock(): 9.92 wtime(): 1.83 clock(): 11.9 wtime(): 1.86

回答2:

is_prime() in your loop.

文章来源: Openmp basic Parallelization

标签

openmp