openmp | 易学教程

Parallel Merge-Sort in OpenMP

阅读更多关于 Parallel Merge-Sort in OpenMP

I have seen an algorithm for parallel merge-sort in a this paper. This is the code: void mergesort_parallel_omp (int a[], int size, int temp[], int threads) { if ( threads == 1) { mergesort_serial(a, size, temp); } else if (threads > 1) { #pragma omp parallel sections { #pragma omp section mergesort_parallel_omp(a, size/2, temp, threads/2); #pragma omp section mergesort_parallel_omp(a + size/2, size - size/2, temp + size/2, threads - threads/2); } merge(a, size, temp); } // threads > 1 } I run it on a multicore. What happens is that at the leafs of the tree, 2 threads run in parallel. After

OpenMP and GSL RNG - Performance Issue - 4 threads implementation 10x slower than pure sequential one (quadcore CPU)

阅读更多关于 OpenMP and GSL RNG - Performance Issue - 4 threads implementation 10x slower than pure sequential one (quadcore CPU)

I am trying to turn a C project of mine from sequential into parallel programming. Although most of the code has now been redesigned from scratch for this purpose, the generation of random numbers is still at its core. Thus, bad performance of the random number generator (RNG) affects very badly the overall performance of the program. I wrote some code lines (see below) to show the problem I am facing without much verbosity. The problem is the following: everytime the number of threads nt increases, the performance gets singnificantly worse. At this workstation (linux kernel 2.6.33.4; gcc 4.4

Installing gcc with OpenMP support on Mac using homebrew has no effect

阅读更多关于 Installing gcc with OpenMP support on Mac using homebrew has no effect

One way of installing gcc with openMP support on OSX is using Homebrew . However, when I follow the usual instruction of brew reinstall gcc --without-multilib It gives me a warning that there is no formula corresponding to the --without-multilib option and hence this will have no effect. Consequently, I do not have openMP support after this reinstallation process. Here is the detailed terminal output. poulin8:02-prange-parallel-loops poulingroup$ brew --version Homebrew 1.3.6 Homebrew/homebrew-core (git revision b5afc; last commit 2017-10-27) poulin8:02-prange-parallel-loops poulingroup$ brew

Can mutex implementations be interchanged (independently of the thread implementation)

阅读更多关于 Can mutex implementations be interchanged (independently of the thread implementation)

Do all mutex implementations ultimately call the same basic system/hardware calls - meaning that they can be interchanged? Specifically, if I'm using __gnu_parallel algorithms (that uses openmp ) and I want to make the classes they call threadsafe may I use boost::mutex for the locking? or must I write my own mutex such as the one described here //An openmp mutex. Can this be replaced with boost::mutex? class Mutex { public: Mutex() { omp_init_lock(&_mutex); } ~Mutex() { omp_destroy_lock(&_mutex); } void lock() { omp_set_lock(&_mutex); } void unlock() { omp_unset_lock(&_mutex); } private: omp

Shared vectors in OpenMP

阅读更多关于 Shared vectors in OpenMP

I am trying to parallize a program I am using and got the following question. Will I get a loss of performance if multiple threads need to read/write on the same vector but different elements of the vector ? I have the feeling thats the reason my program hardly gets any faster upon parallizing it. Take the following code: #include <vector> int main(){ vector<double> numbers; vector<double> results(10); double x; //write 10 values in vector numbers for (int i =0; i<10; i++){ numbers.push_back(cos(i)); } #pragma omp parallel for \ private(x) \ shared(numbers, results) for(int j = 0; j < 10; j++)

The differences in the accuracy of the calculations in single / multi-threaded (OpenMP) modes

阅读更多关于 The differences in the accuracy of the calculations in single / multi-threaded (OpenMP) modes

问题 Can anybody explain/understand the different of the calculation result in single / multi-threaded mode? Here is an example of approx. calculation of pi: #include <iomanip> #include <cmath> #include <ppl.h> const int itera(1000000000); int main() { printf("PI calculation \nconst int itera = 1000000000\n\n"); clock_t start, stop; //Single thread start = clock(); double summ_single(0); for (int n = 1; n < itera; n++) { summ_single += 6.0 / (static_cast<double>(n)* static_cast<double>(n)); };

OpenMP: how to flush pointer target?

阅读更多关于 OpenMP: how to flush pointer target?

问题 I’ve just noticed that the following code doesn’t compile in OpenMP (under GCC 4.5.1): struct job { unsigned busy_children; }; job* j = allocateJob(…); // … #pragma omp flush(j->busy_children) The compiler complains about the -> in the argument list to flush, and according to the OpenMP specification it’s right: flush expects as arguments a list of “id-expression”s, which basically means only (qualified) IDs are allowed, no expressions. Furthermore, the spec says this about flush and pointers

Can std::atomic be safely used with OpenMP

阅读更多关于 Can std::atomic be safely used with OpenMP

I'm currently trying to learn ow to use OpenMP and I have a question. Is it safe to do something like that : std::atomic<double> result; #pragma omp parallel for for(...) { result+= //some stuff; } Or shall I use : double result; #pragma omp parallel for for(...) { double tmp=0; //some stuff; #pragma omp atomic result+=tmp; } Thanks ! Edit : I know the most simple way to handle that is using an array, but Im asking because I'm curious Officially, no. In practice, probably. Page Section 1.7 page 32 of the OpenMP 5.0 Specification says: While future versions of the OpenMP specification are

Why the OpenMP SIMD directive reduces performance?

阅读更多关于 Why the OpenMP SIMD directive reduces performance?

问题 I am learning how to use SIMD directives with OpenMP/Fortran. I wrote the simple code: program loop implicit none integer :: i,j real*8 :: x x = 0.0 do i=1,10000 do j=1,10000000 x = x + 1.0/(1.0*i) enddo enddo print*, x end program loop when I compile this code and run it I get: ifort -O3 -vec-report3 -xhost loop_simd.f90 loop_simd.f90(10): (col. 12) remark: LOOP WAS VECTORIZED loop_simd.f90(9): (col. 7) remark: loop was not vectorized: not inner loop time ./a.out 97876060.8355515 real 0m8

pragma omp parallel for vs. pragma omp parallel

阅读更多关于 pragma omp parallel for vs. pragma omp parallel

In C++ with openMP, is there any difference between #pragma omp parallel for for(int i=0; i<N; i++) { ... } and #pragma omp parallel for(int i=0; i<N; i++) { ... } ? Thanks! #pragma omp parallel for(int i=0; i<N; i++) { ... } This code creates a parallel region, and each individual thread executes what is in your loop. In other words, you do the complete loop N times, instead of N threads splitting up the loop and completing all iterations just once. You can do: #pragma omp parallel { #pragma omp for for( int i=0; i < N; ++i ) { } #pragma omp for for( int i=0; i < N; ++i ) { } } This will