openmp

How can I set the number of OpenMP threads from within the program?

断了今生、忘了曾经 提交于 2019-12-07 06:28:53
问题 Running the program as $ OMP_NUM_TRHEADS=4 ./a.out limits the number of active OpenMP threads to 4, as evidenced by htop . However, if instead of binding the OMP_NUM_THREADS environment variable in Bash , I call setenv("OMP_NUM_THREADS", "4", 1); from main before calling any OpenMP-enabled functions, this seems to have no effect. Why is this happening? How can I set the number of OpenMP threads from within the program, if it's possible at all? 回答1: There are two ways 1 one can use to set the

incomprehensible performance improvement with openmp even when num_threads(1)

不想你离开。 提交于 2019-12-07 06:22:41
问题 The following lines of code int nrows = 4096; int ncols = 4096; size_t numel = nrows * ncols; unsigned char *buff = (unsigned char *) malloc( numel ); unsigned char *pbuff = buff; #pragma omp parallel for schedule(static), firstprivate(pbuff, nrows, ncols), num_threads(1) for (int i=0; i<nrows; i++) { for (int j=0; j<ncols; j++) { *pbuff += 1; pbuff++; } } take 11130 usecs to run on my i5-3230M when compiled with g++ -o main main.cpp -std=c++0x -O3 That is, when the openmp pragmas are ignored

OpenMP: don't use hyperthreading cores (half `num_threads()` w/ hyperthreading)

[亡魂溺海] 提交于 2019-12-07 06:15:40
问题 In Is OpenMP (parallel for) in g++ 4.7 not very efficient? 2.5x at 5x CPU, I determined that the performance of my programme varies between 11s and 13s (mostly always above 12s, and sometimes as slow as 13.4s) at around 500% CPU when using the default #pragma omp parallel for , and the OpenMP speed up is only 2.5x at 5x CPU w/ g++-4.7 -O3 -fopenmp , on a 4-core 8-thread Xeon. I tried using schedule(static) num_threads(4) , and noticed that my programme always completes in 11.5s to 11.7s

OpenMP and GSL RNG - Performance Issue - 4 threads implementation 10x slower than pure sequential one (quadcore CPU)

*爱你&永不变心* 提交于 2019-12-07 05:04:20
问题 I am trying to turn a C project of mine from sequential into parallel programming. Although most of the code has now been redesigned from scratch for this purpose, the generation of random numbers is still at its core. Thus, bad performance of the random number generator (RNG) affects very badly the overall performance of the program. I wrote some code lines (see below) to show the problem I am facing without much verbosity. The problem is the following: everytime the number of threads nt

Can I assign multiple threads to a code section in OpenMP?

你离开我真会死。 提交于 2019-12-07 04:24:54
问题 I'm looking for a way to execute sections of code in parallel using multiple threads for each section. For example, if I have 16 threads and two tasks, I want 8 threads each to simultaneously execute those two tasks. OpenMP has several constructs ( section , task ) that execute general code in parallel, but they are single-threaded. In my scenario, using section or task would result in one thread executing each of the two tasks, while 14 threads sit idly by. Is something like that even

Shared vectors in OpenMP

99封情书 提交于 2019-12-07 02:51:55
问题 I am trying to parallize a program I am using and got the following question. Will I get a loss of performance if multiple threads need to read/write on the same vector but different elements of the vector ? I have the feeling thats the reason my program hardly gets any faster upon parallizing it. Take the following code: #include <vector> int main(){ vector<double> numbers; vector<double> results(10); double x; //write 10 values in vector numbers for (int i =0; i<10; i++){ numbers.push_back

How can Microsoft's OpenMP spinlock time be controlled?

ぐ巨炮叔叔 提交于 2019-12-07 02:28:50
问题 The OpenMP used by the Intel compiler supports an environment variable KMP_BLOCKTIME (docs) which I believe controls the busy-waiting (spinlocked) time the threads will spend waiting for new work (linked document claims this defaults to 200ms). The OpenMP used by the Gnu compiler supports an environment variable GOMP_SPINCOUNT (docs) which I believe also controls that library's equivalent implementation detail (although apparently expressed as an iteration count rather than a time). My

Why is my computer not showing a speedup when I use parallel code?

陌路散爱 提交于 2019-12-07 02:15:22
问题 So I realize this question sounds stupid (and yes I am using a dual core), but I have tried two different libraries (Grand Central Dispatch and OpenMP), and when using clock() to time the code with and without the lines that make it parallel, the speed is the same. (for the record they were both using their own form of parallel for). They report being run on different threads, but perhaps they are running on the same core? Is there any way to check? (Both libraries are for C, I'm

Parallel Merge-Sort in OpenMP

落花浮王杯 提交于 2019-12-07 02:10:36
问题 I have seen an algorithm for parallel merge-sort in a this paper. This is the code: void mergesort_parallel_omp (int a[], int size, int temp[], int threads) { if ( threads == 1) { mergesort_serial(a, size, temp); } else if (threads > 1) { #pragma omp parallel sections { #pragma omp section mergesort_parallel_omp(a, size/2, temp, threads/2); #pragma omp section mergesort_parallel_omp(a + size/2, size - size/2, temp + size/2, threads - threads/2); } merge(a, size, temp); } // threads > 1 } I

How to parallelize correctly a nested for loops

喜欢而已 提交于 2019-12-07 00:02:32
问题 I'm working with OpenMP to parallelize a scalar nested for loop: double P[N][N]; double x=0.0,y=0.0; for (int i=0; i<N; i++) { for (int j=0; j<N; j++) { P[i][j]=someLongFunction(x,y); y+=1; } x+=1; } In this loop the important thing is that matrix P must be the same in both scalar and parallel versions: All my possible trials didn't succeed... 回答1: The problem here is that you have added iteration-to-iteration dependencies with: x+=1; y+=1; Therefore, as the code stands right now, it is not