openmp

Poor performance due to hyper-threading with OpenMP: how to bind threads to cores

。_饼干妹妹 提交于 2019-12-20 17:59:32
问题 I am developing large dense matrix multiplication code. When I profile the code it sometimes gets about 75% of the peak flops of my four core system and other times gets about 36%. The efficiency does not change between executions of the code. It either starts at 75% and continues with that efficiency or starts at 36% and continues with that efficiency. I have traced the problem down to hyper-threading and the fact that I set the number of threads to four instead of the default eight. When I

Is armadillo solve() thread safe?

非 Y 不嫁゛ 提交于 2019-12-20 15:37:38
问题 In my code I have loop in which I construct and over determined linear system and try to solve it: #pragma omp parallel for for (int i = 0; i < n[0]+1; i++) { for (int j = 0; j < n[1]+1; j++) { for (int k = 0; k < n[2]+1; k++) { arma::mat A(max_points, 2); arma::mat y(max_points, 1); // initialize A and y arma::vec solution = solve(A,y); } } } Sometimes, quite randomly the program hangs or the results in the solution vector are NaN. And if I put do this: arma::vec solution; #pragma omp

difference between omp critical and omp single

二次信任 提交于 2019-12-20 08:41:03
问题 I am trying to understand the exact difference between #pragma omp critical and #pragma omp single in OpenMP: Microsoft definitions for these are: Single: Lets you specify that a section of code should be executed on a single thread, not necessarily the master thread. Critical: Specifies that code is only be executed on one thread at a time. So it means that in both, the exact section of code afterwards would be executed by just one thread and other threads will not enter that section e.g. if

Rcpp causes segfault RcppArmadillo does not

假装没事ソ 提交于 2019-12-20 07:40:01
问题 I'm currently trying to parallelize an existing hierarchical MCMC sampling scheme. The majority of my (by now sequential) source code is written in RcppArmadillo, so I'd like to stick with this framework for parallelization, too. Before starting with parallelizing my code I have read a couple of blog posts on Rcpp/Openmp. In the majority of these blog posts (e.g. Drew Schmidt, wrathematics) the authors warn about the issue of thread safety, R/Rcpp data structures and Openmp. The bottom line

Rcpp causes segfault RcppArmadillo does not

两盒软妹~` 提交于 2019-12-20 07:39:21
问题 I'm currently trying to parallelize an existing hierarchical MCMC sampling scheme. The majority of my (by now sequential) source code is written in RcppArmadillo, so I'd like to stick with this framework for parallelization, too. Before starting with parallelizing my code I have read a couple of blog posts on Rcpp/Openmp. In the majority of these blog posts (e.g. Drew Schmidt, wrathematics) the authors warn about the issue of thread safety, R/Rcpp data structures and Openmp. The bottom line

OpenMP implementation of reduction

泄露秘密 提交于 2019-12-20 07:11:17
问题 I need to implement reduction operation (for each thread the value should be stored in different array entry). However, it runs slower for more threads. Any suggestions? double local_sum[16];. //Initializations.... #pragma omp parallel for shared(h,n,a) private(x, thread_id) for (i = 1; i < n; i++) { thread_id = omp_get_thread_num(); x = a + i* h; local_sum[thread_id] += f(x); } 回答1: You are experiencing the effects of false sharing. On x86 a single cache line is 64 bytes long and therefore

Order of execution in Reduction Operation in OpenMP

孤街浪徒 提交于 2019-12-20 06:17:40
问题 Is there a way to know the order of execution for a reduction operator in OpenMP? In other words, I would like to know how the threads execute reduction operation- is it left to right? What happens when there are numbers that are not power of 2? 回答1: I think you'll find that OpenMP will only reduce on associative operations, such as + and * (or addition and multiplication if you prefer) which means that it can proceed oblivious to the order of evaluation of the component parts of the

With OpenMP parallelized nested loops run slow

北慕城南 提交于 2019-12-20 04:54:17
问题 I've got a part of a fortran program consisting of some nested loops which I want to parallelize with OpenMP. integer :: nstates , N, i, dima, dimb, dimc, a_row, b_row, b_col, c_row, row, col double complex, dimension(4,4):: mat double complex, dimension(:), allocatable :: vecin,vecout nstates = 2 N = 24 allocate(vecin(nstates**N), vecout(nstates**N)) vecin = ...some data vecout = 0 mat = reshape([...some data...],[4,4]) dimb=nstates**2 !$OMP PARALLEL DO PRIVATE(dima,dimc,row,col,a_row,b_row

OpenMP with 1 thread slower than sequential version

て烟熏妆下的殇ゞ 提交于 2019-12-20 04:38:17
问题 I have implemented knapsack using OpenMP (gcc version 4.6.3) #define MAX(x,y) ((x)>(y) ? (x) : (y)) #define table(i,j) table[(i)*(C+1)+(j)] for(i=1; i<=N; ++i) { #pragma omp parallel for for(j=1; j<=C; ++j) { if(weights[i]>j) { table(i,j) = table(i-1,j); }else { table(i,j) = MAX(profits[i]+table(i-1,j-weights[i]), table(i-1,j)); } } } execution time for the sequential program = 1s execution time for the openmp with 1 thread = 1.7s (overhead = 40%) Used the same compiler optimization flags (

Multithreaded program segfaults with OpenSSL and OpenMP

倾然丶 夕夏残阳落幕 提交于 2019-12-20 04:31:16
问题 I am using OpenSSL in a multithreaded program in C and having issues. So I wrote a small program to try to narrow down what the problem is. The functions besides the main function were copy pasted from https://github.com/plenluno/openssl/blob/master/openssl/crypto/threads/mttest.c My program is as follows. #include<stdio.h> #include<stdlib.h> #include<stdarg.h> #include <strings.h> #include <string.h> #include <math.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include<omp