openmp | 易学教程

May compiler optimizations be inhibited by multi-threading?

阅读更多关于 May compiler optimizations be inhibited by multi-threading?

问题 It happened to me a few times to parallelize portion of programs with OpenMP just to notice that in the end, despite the good scalability, most of the foreseen speed-up was lost due to the poor performance of the single threaded case (if compared to the serial version). The usual explanation that appears on the web for this behavior is that the code generated by compilers may be worse in the multi-threaded case . Anyhow I am not able to find anywhere a reference that explains why the assembly

How to get the type of a variable in C code?

阅读更多关于 How to get the type of a variable in C code?

问题 Is there any way that I can discover the type of a variable automatically in C, either through some mechanism within the program itself, or--more likely--through a pre-compilation script that uses the compiler's passes up to the point where it has parsed the variables and assigned them their types? I'm looking for general suggestions about this. Below is more background about what I need and why. I would like to change the semantics of the OpenMP reduction clause. At this point, it seems

gfortran openmp segmentation fault occurs on basic do loop

阅读更多关于 gfortran openmp segmentation fault occurs on basic do loop

问题 I have a program which distributes particles into a cloud-in-cell mesh. Simply loops over the total number of particles (Ntot) and populates a 256^3 mesh (i.e. each particle gets distributed over 8 cells). % gfortran -fopenmp cic.f90 -o ./cic Which compiles fine. But when I run it (./cic) I get a segmentation fault. I my looping is a classic omp do problem. The program works when I don't compile it in openmp. !$omp parallel do do i = 1,Ntot if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then

Openmp and reduction on std::vector?

阅读更多关于 Openmp and reduction on std::vector?

问题 I want to make this code parallel: std::vector<float> res(n,0); std::vector<float> vals(m); std::vector<float> indexes(m); // fill indexes with values in range [0,n) // fill vals and indexes for(size_t i=0; i<m; i++){ res[indexes[i]] += //something using vas[i]; } In this article it's suggested to use: #pragma omp parallel for reduction(+:myArray[:6]) In this question the same approach is proposed in the comments section. I have two questions: I don't know m at compile time, and from these

OpenMP, for loop inside section

阅读更多关于 OpenMP, for loop inside section

问题 I would like to run the following code (below). I want to spawn two independent threads, each one would run a parallel for loop. Unfortunately, I get an error. Apparently, parallel for cannot be spawned inside section . How to solve that? #include <omp.h> #include "stdio.h" int main() { omp_set_num_threads(10); #pragma omp parallel #pragma omp sections { #pragma omp section #pragma omp for for(int i=0; i<5; i++) { printf("x %d\n", i); } #pragma omp section #pragma omp for for(int i=0; i<5; i+

C++: Timing in Linux (using clock()) is out of sync (due to OpenMP?)

阅读更多关于 C++: Timing in Linux (using clock()) is out of sync (due to OpenMP?)

问题 At the top and end of my program I use clock() to figure out how long my program takes to finish. Unfortunately, it appears to take half as long as it's reporting. I double checked this with the "time" command. My program reports: Completed in 45.86s Time command reports: real 0m22.837s user 0m45.735s sys 0m0.152s Using my cellphone to time it, it completed in 23s (aka: the "real" time). "User" time is the sum of all threads, which would make sense since I'm using OpenMP. (You can read about

OpenMP multiple threads update same array

阅读更多关于 OpenMP multiple threads update same array

问题 I have the following code in my program and I want to accelerate it using OpenMP. ... for(i=curr_index; i < curr_index + rx_size; i+=2){ int64_t tgt = rcvq[i]; int64_t src = rcvq[i+1]; if (!TEST(tgt)) { pred[tgt] = src; newq[newq_count++] = tgt; } } Currently, I have a version as follows: ... chunk = rx_sz / omp_nthreads; #pragma omp parallel for num_threads(omp_nthreads) for (ii = 0; ii < omp_nthreads; ii++) { int start = curr_index + ii * chunk; for (index = start; index < start + chunk;

Difference between num_threads vs. omp_set_num_threads vs OMP_NUM_THREADS

阅读更多关于 Difference between num_threads vs. omp_set_num_threads vs OMP_NUM_THREADS

问题 I am quite confused about the ways to specify the number of threads in parallel part of a code. I know I can use: the enviromental variable OMP_NUM_THREADS function omp_set_num_threads(int) num_threads(int) in #pragma omp parallel for num_threads(NB_OF_THREADS) What I have gathered so far the first two are equivalent. But what about the third one? Can someone provide a more detailed exposition of the difference, I could not find any information in the internet regarding the difference between

Difference between num_threads vs. omp_set_num_threads vs OMP_NUM_THREADS

阅读更多关于 Difference between num_threads vs. omp_set_num_threads vs OMP_NUM_THREADS

omp parallel vs. omp parallel for

阅读更多关于 omp parallel vs. omp parallel for

问题 What is the difference between these two? [A] #pragma omp parallel { #pragma omp for for(int i = 1; i < 100; ++i) { ... } } [B] #pragma omp parallel for for(int i = 1; i < 100; ++i) { ... } 回答1: I don't think there is any difference, one is a shortcut for the other. Although your exact implementation might deal with them differently. The combined parallel worksharing constructs are a shortcut for specifying a parallel construct containing one worksharing construct and no other statements.