openmp

May compiler optimizations be inhibited by multi-threading?

佐手、 提交于 2019-12-17 18:57:10
问题 It happened to me a few times to parallelize portion of programs with OpenMP just to notice that in the end, despite the good scalability, most of the foreseen speed-up was lost due to the poor performance of the single threaded case (if compared to the serial version). The usual explanation that appears on the web for this behavior is that the code generated by compilers may be worse in the multi-threaded case . Anyhow I am not able to find anywhere a reference that explains why the assembly

How to get the type of a variable in C code?

只愿长相守 提交于 2019-12-17 18:55:46
问题 Is there any way that I can discover the type of a variable automatically in C, either through some mechanism within the program itself, or--more likely--through a pre-compilation script that uses the compiler's passes up to the point where it has parsed the variables and assigned them their types? I'm looking for general suggestions about this. Below is more background about what I need and why. I would like to change the semantics of the OpenMP reduction clause. At this point, it seems

gfortran openmp segmentation fault occurs on basic do loop

丶灬走出姿态 提交于 2019-12-17 16:53:46
问题 I have a program which distributes particles into a cloud-in-cell mesh. Simply loops over the total number of particles (Ntot) and populates a 256^3 mesh (i.e. each particle gets distributed over 8 cells). % gfortran -fopenmp cic.f90 -o ./cic Which compiles fine. But when I run it (./cic) I get a segmentation fault. I my looping is a classic omp do problem. The program works when I don't compile it in openmp. !$omp parallel do do i = 1,Ntot if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then

Openmp and reduction on std::vector?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-17 16:38:09
问题 I want to make this code parallel: std::vector<float> res(n,0); std::vector<float> vals(m); std::vector<float> indexes(m); // fill indexes with values in range [0,n) // fill vals and indexes for(size_t i=0; i<m; i++){ res[indexes[i]] += //something using vas[i]; } In this article it's suggested to use: #pragma omp parallel for reduction(+:myArray[:6]) In this question the same approach is proposed in the comments section. I have two questions: I don't know m at compile time, and from these

OpenMP, for loop inside section

你离开我真会死。 提交于 2019-12-17 16:26:37
问题 I would like to run the following code (below). I want to spawn two independent threads, each one would run a parallel for loop. Unfortunately, I get an error. Apparently, parallel for cannot be spawned inside section . How to solve that? #include <omp.h> #include "stdio.h" int main() { omp_set_num_threads(10); #pragma omp parallel #pragma omp sections { #pragma omp section #pragma omp for for(int i=0; i<5; i++) { printf("x %d\n", i); } #pragma omp section #pragma omp for for(int i=0; i<5; i+

C++: Timing in Linux (using clock()) is out of sync (due to OpenMP?)

a 夏天 提交于 2019-12-17 14:56:40
问题 At the top and end of my program I use clock() to figure out how long my program takes to finish. Unfortunately, it appears to take half as long as it's reporting. I double checked this with the "time" command. My program reports: Completed in 45.86s Time command reports: real 0m22.837s user 0m45.735s sys 0m0.152s Using my cellphone to time it, it completed in 23s (aka: the "real" time). "User" time is the sum of all threads, which would make sense since I'm using OpenMP. (You can read about

OpenMP multiple threads update same array

心已入冬 提交于 2019-12-17 14:54:35
问题 I have the following code in my program and I want to accelerate it using OpenMP. ... for(i=curr_index; i < curr_index + rx_size; i+=2){ int64_t tgt = rcvq[i]; int64_t src = rcvq[i+1]; if (!TEST(tgt)) { pred[tgt] = src; newq[newq_count++] = tgt; } } Currently, I have a version as follows: ... chunk = rx_sz / omp_nthreads; #pragma omp parallel for num_threads(omp_nthreads) for (ii = 0; ii < omp_nthreads; ii++) { int start = curr_index + ii * chunk; for (index = start; index < start + chunk;

Difference between num_threads vs. omp_set_num_threads vs OMP_NUM_THREADS

自闭症网瘾萝莉.ら 提交于 2019-12-17 12:17:09
问题 I am quite confused about the ways to specify the number of threads in parallel part of a code. I know I can use: the enviromental variable OMP_NUM_THREADS function omp_set_num_threads(int) num_threads(int) in #pragma omp parallel for num_threads(NB_OF_THREADS) What I have gathered so far the first two are equivalent. But what about the third one? Can someone provide a more detailed exposition of the difference, I could not find any information in the internet regarding the difference between

Difference between num_threads vs. omp_set_num_threads vs OMP_NUM_THREADS

萝らか妹 提交于 2019-12-17 12:17:03
问题 I am quite confused about the ways to specify the number of threads in parallel part of a code. I know I can use: the enviromental variable OMP_NUM_THREADS function omp_set_num_threads(int) num_threads(int) in #pragma omp parallel for num_threads(NB_OF_THREADS) What I have gathered so far the first two are equivalent. But what about the third one? Can someone provide a more detailed exposition of the difference, I could not find any information in the internet regarding the difference between

omp parallel vs. omp parallel for

允我心安 提交于 2019-12-17 10:12:20
问题 What is the difference between these two? [A] #pragma omp parallel { #pragma omp for for(int i = 1; i < 100; ++i) { ... } } [B] #pragma omp parallel for for(int i = 1; i < 100; ++i) { ... } 回答1: I don't think there is any difference, one is a shortcut for the other. Although your exact implementation might deal with them differently. The combined parallel worksharing constructs are a shortcut for specifying a parallel construct containing one worksharing construct and no other statements.