openmp

OpenMP Producer-Consumer unexpected result

≯℡__Kan透↙ 提交于 2019-12-23 20:12:36
问题 I am working on a simple producer-consumer problem, using OpenMP in C. My program creates 4 threads, two of which are consumers and two producers. Each producer places a character in a buffer, and the consumers just print the character. My aim is to synchronize the producers/consumers so that each producer will produce the next in order character of the alphabet and each consumer will print the next in order character that is placed in the buffer. This is my code: #include <stdio.h> #include

No speed-up with useless printf's using OpenMP

孤街浪徒 提交于 2019-12-23 18:50:29
问题 I just wrote my first OpenMP program that parallelizes a simple for loop. I ran the code on my dual core machine and saw some speed up when going from 1 thread to 2 threads. However, I ran the same code on a school linux server and saw no speed-up. After trying different things, I finally realized that removing some useless printf statements caused the code to have significant speed-up. Below is the main part of the code that I parallelized: #pragma omp parallel for private(i) for(i = 2; i <=

Why fails the following OpenMP program to reduce my variable?

旧巷老猫 提交于 2019-12-23 17:24:53
问题 Consider the following minimal C code example. When compiling and executing with export OMP_NUM_THREADS=4 && gcc -fopenmp minimal.c && ./a.out (GCC 4.9.2 on Debian 8), this produces five lines with rho=100 (sometimes also 200 or 400) on my machine. Expected output is of course rho=400 for all five printed lines. The program is more likely to produce the correct result if I insert more code at // MARKER or place a barrier just there. But even with another barrier, it sometimes fails and so

Assign different number of openmp threads to each mpi process

自闭症网瘾萝莉.ら 提交于 2019-12-23 17:02:07
问题 Assume that I have a code that runs on 384 MPI processes (24 compute nodes with 16 cores per compute node) and use the following simple script to submit my job to a job queue #!/bin/bash #PBS -S /bin/bash #PBS -l nodes=24:ppn=16 #PBS -l walltime=01:00:00 cd $PBS_O_WORKDIR module load openmpi mpirun mycode > output_file Is the following scenario possible: I need to assign one more node with 16 cores to do some specific calculations using 'openmp' and updates the rest of the 384 processes at

OpenMP: Can't parallelize nested for loops

这一生的挚爱 提交于 2019-12-23 16:15:26
问题 I want to parallelize loop with inner loop within it. My Code looks like this: #pragma omp parallel for private(jb,ib) shared(n, Nb, lb, lastBlock, jj, W, WT) schedule(dynamic) //private(ib, jb) shared(n, Nb, lb, lastBlock, jj, W, WT) //parallel for loop with omp for(jb=0; jb<Nb; jb++) { int lbh = (jb==Nb-1) ? lastBlock : lb; int ip = omp_get_thread_num(); packWT(a, n, lb, s, jb, colNr, WT[ip], nr); //pack WWT[jb] for(ib=jb; ib<Nb; ib++) { int lbv = (ib==Nb-1) ? lastBlock : lb; multBlock

openmp code not running in parallel

我的未来我决定 提交于 2019-12-23 15:53:24
问题 omp_set_num_threads( 8 ); #pragma omp parallel for for( int tx = 0; tx < numThread; tx++ ) { cout<<"\nThread :"<<omp_get_num_threads()<<"\n"; } My understanding is that the above code is supposed to print 8. But the output I am getting is Thread :1 Thread :1 Thread :1 Thread :1 Thread :1 Thread :1 Thread :1 Thread :1 Please let me know what is going wrong here. I am beginner to openmp and so I am quite sure i must have made some stupid mistake. Thanks in advance 回答1: I'm not sure what's

OMP threadprivate objects not being destructed

点点圈 提交于 2019-12-23 12:37:47
问题 Bottom line How can I make sure that the threadprivate instances are properly destructed? Background When answering this question I came across an oddity when using the Intel C++ 15.0 compiler in VS2013. When declaring a global variable threadprivate the slave threads copies are not destructed. I started looking for ways to force their destruction. At this site, they say that adding an OMP barrier should help. It doesn't (see MCVE). I tried setting the OMP blocktime to 0 so that the threads

Why is my OpenMP implementation slower than a single threaded implementation?

一世执手 提交于 2019-12-23 12:25:31
问题 I am learning about OpenMP concurrency, and tried my hand at some existing code I have. In this code, I tried to make all the for loops parallel. However, this seems to make the program MUCH slower, at least 10x slower, or even more than the single threaded version. Here is the code: http://pastebin.com/zyLzuWU2 I also used pthreads, which turns out to be faster than the single threaded version. Now the question is, what am I doing wrong in my OpenMP implementation that is causing this

Does OpenMP copy private objects?

别等时光非礼了梦想. 提交于 2019-12-23 12:18:49
问题 I'm writing a program that reads huge file (3x280 GB) and does a fitting procedure to the data in the file. It's pretty convenient to parallelise such a program, where this is easily done with OpenMP. The thing I don't understand is how private variables are taken in OpenMP. As we all know, fstream's obejcts are a non-copyable, and intiuitively, that prevented me from using it as a private object. So the reader of the file was shared. I got some problem later, and I thought of trying have

Using openmp with odeint and adaptative step sizes

人走茶凉 提交于 2019-12-23 11:31:10
问题 I am trying to use openmp to parallelize my code. Everything works just fine when I use constant step sizes, however when I run the same code using an adaptative stepper I get errors that I don't understand. Here are the essential parts of the code : using namespace std; using namespace boost::numeric::odeint; const int jmax = 10; typedef double value_type; typedef boost::array<value_type ,2*(jmax+1) > state_type; //The step function void rhs( const state_type A , state_type &dAdt , const