openmp

Is it much faster to re-initialize a vector using OpenMP threads?

跟風遠走 提交于 2019-12-02 05:43:20
I'm using OpenMP libraries for parallel computing. I use C++ vectors, whose size is usually in the order of 1*10^5. While going through iteration process, I need to re-initialize a bunch of these large vectors(not thread private but global scope) to a initial value. which is the faster way to do this?, using #pragma omp for or #pragma omp single? The general answer would need to be "it depends, you have to measure" since initialization in C++ can be, depending on the type, trivial or very expensive. You did not provide an awful lot of detail, so one has to guess a bit. If a class has a

OpenMP and Thread Local Storage identifier with icc

霸气de小男生 提交于 2019-12-02 05:36:03
问题 This is a simple test code: #include <stdlib.h> __thread int a = 0; int main() { #pragma omp parallel default(none) { a = 1; } return 0; } gcc compiles this without any problems with -fopenmp , but icc (ICC) 12.0.2 20110112 with -openmp complains with test.c(7): error: "a" must be specified in a variable list at enclosing OpenMP parallel pragma #pragma omp parallel default(none) I have no clue which paradigm (i.e. shared , private , threadprivate ) applies to this type of variables. Which one

Not sure how to explain some of the performance results of my parallelized matrix multiplication code

≡放荡痞女 提交于 2019-12-02 04:26:37
I'm running this code in OpenMP for matrix multiplication and I measured its results: #pragma omp for schedule(static) for (int j = 0; j < COLUMNS; j++) for (int k = 0; k < COLUMNS; k++) for (int i = 0; i < ROWS; i++) matrix_r[i][j] += matrix_a[i][k] * matrix_b[k][j]; There are different versions of the code based on where i put the #pragma omp directive - before the j loop, k loop, or the i loop. Also, for every one of those variants I ran different versions for default static scheduling, static scheduling with chunks 1 and 10 and dynamic scheduling with the same chunks. I also measured the

Reduction with OpenMP: linear merging or log(number of threads) merging

别来无恙 提交于 2019-12-02 04:04:59
I have a general question about reductions with OpenMP that's bothered me for a while. My question is in regards to merging the partial sums in a reduction. It can either be done linearly or as the log of the number of threads. Let's assume I want to do a reduction of some function double foo(int i) . With OpenMP I could do it like this. double sum = 0.0; #pragma omp parallel for reduction (+:sum) for(int i=0; i<n; i++) { sum += f(i); } However, I claim that the following code will be just as efficient. double sum = 0.0; #pragma omp parallel { double sum_private = 0.0; #pragma omp for nowait

omp reduction on vector of cv::Mat or cv::Mat in general

佐手、 提交于 2019-12-02 04:03:27
问题 //In other words, this equilavent to cv::Mat1f mat(5,n) //i.e. a matrix 5xn std::vector<cv::Mat1f> mat(5,cv::Mat1f::zeros(1,n)); std::vector<float> indexes(m); // fill indexes // m >> nThreads (from hundreds to thousands) for(size_t i=0; i<m; i++){ mat[indexes[m]] += 1; } The expected result is to increase each element of each row by one. This is a toy example, the actual sum is far more compliacted. I tried to parallelize it with: #pragma omp declare reduction(vec_float_plus : std::vector<cv

OpenMP with 1 thread slower than sequential version

左心房为你撑大大i 提交于 2019-12-02 03:38:44
I have implemented knapsack using OpenMP (gcc version 4.6.3) #define MAX(x,y) ((x)>(y) ? (x) : (y)) #define table(i,j) table[(i)*(C+1)+(j)] for(i=1; i<=N; ++i) { #pragma omp parallel for for(j=1; j<=C; ++j) { if(weights[i]>j) { table(i,j) = table(i-1,j); }else { table(i,j) = MAX(profits[i]+table(i-1,j-weights[i]), table(i-1,j)); } } } execution time for the sequential program = 1s execution time for the openmp with 1 thread = 1.7s (overhead = 40%) Used the same compiler optimization flags (-O3) in the both cases. Can someone explain the reason behind this behavior. Thanks. Enabling OpenMP

Parallelization for Monte Carlo pi approximation

大兔子大兔子 提交于 2019-12-02 03:01:09
I am writing a c script to parallelize pi approximation with OpenMp. I think my code works fine with a convincing output. I am running it with 4 threads now. What I am not sure is that if this code is vulnerable to race condition? and if it is, how do I coordinate the thread action in this code ? the code looks as follows: #include <stdlib.h> #include <stdio.h> #include <time.h> #include <math.h> #include <omp.h> double sample_interval(double a, double b) { double x = ((double) rand())/((double) RAND_MAX); return (b-a)*x + a; } int main (int argc, char **argv) { int N = atoi( argv[1] ); //

Multithreaded program segfaults with OpenSSL and OpenMP

这一生的挚爱 提交于 2019-12-02 02:54:31
I am using OpenSSL in a multithreaded program in C and having issues. So I wrote a small program to try to narrow down what the problem is. The functions besides the main function were copy pasted from https://github.com/plenluno/openssl/blob/master/openssl/crypto/threads/mttest.c My program is as follows. #include<stdio.h> #include<stdlib.h> #include<stdarg.h> #include <strings.h> #include <string.h> #include <math.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include<omp.h> #include <openssl/bn.h> #include <openssl/dh.h> #include <openssl/crypto.h> #include <pthread.h>

OpenMp doesn't utilize all CPUs(dual socket, windows and Microsoft visual studio)

本小妞迷上赌 提交于 2019-12-02 02:52:27
I have a dual socket system with 22 real cores per CPU or 44 hyperthreads per CPU. I can get openMP to completely utilize the first CPU(22 cores/44 hyper) but I cannot get it to utilize the second CPU. I am using CPUID HWMonitor to check my core usage. The second CPU is always at or near 0 % on all cores. Using: int nProcessors = omp_get_max_threads(); gets me nProcessors = 44, but I think it's just using the 44 hyperthreads of 1 CPU instead of 44 real cores(should be 88 hyperthreads) After looking around a lot, I'm not sure how to utilize the other CPU. My CPU is running fine as I can run

Use OpenMP with Windows SDK

99封情书 提交于 2019-12-02 02:44:39
I am aware that VC2010 Express Edition does not include OpenMP support and therefore would report omp.h file missing. Therefore, I have installed Windows SDK v7.1 64-bit version in Windows. However, even I ran: set DISTUTIL_USE_SDK=1 setenv /x64 /release And then try to compile the code, it would still report cannot find omp.h. Could anyone give me a hint on how to solve this? Did some checking, and it appears that OpenMP is not part of the Windows SDK, and is only shipped with Visual C++ 2010 Professional or Ultimate editions . 来源: https://stackoverflow.com/questions/23935748/use-openmp-with