openmp | 易学教程

Can't get over 50% max. theoretical performance on matrix multiply

阅读更多关于 Can't get over 50% max. theoretical performance on matrix multiply

问题 Problem I am learning about HPC and code optimization. I attempt to replicate the results in Goto's seminal matrix multiplication paper (http://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf). Despite my best efforts, I cannot get over ~50% maximum theoretical CPU performance. Background See related issues here (Optimized 2x2 matrix multiplication: Slow assembly versus fast SIMD), including info about my hardware What I have attempted This related paper (http://www.cs

Can't get over 50% max. theoretical performance on matrix multiply

阅读更多关于 Can't get over 50% max. theoretical performance on matrix multiply

Specify OpenMP to GCC

阅读更多关于 Specify OpenMP to GCC

问题 For OpenMP, when my code is using the functions in its API (for example, omp_get_thread_num()) without using its directives (such as those #pragma omp ...), why directly specifying libgomp.a to gcc instead of using -fopenmp doesn't work, such as gcc hello.c /usr/lib/gcc/i686-linux-gnu/4.4/libgomp.a -o hello Update: I just found that linking to libgomp.a does not work, but linking to libgomp.so works. Does it mean OpenMP can not be static linked? Why -fopenmp only works without specifying the

Parallelization for Monte Carlo pi approximation

阅读更多关于 Parallelization for Monte Carlo pi approximation

问题 I am writing a c script to parallelize pi approximation with OpenMp. I think my code works fine with a convincing output. I am running it with 4 threads now. What I am not sure is that if this code is vulnerable to race condition? and if it is, how do I coordinate the thread action in this code ? the code looks as follows: #include <stdlib.h> #include <stdio.h> #include <time.h> #include <math.h> #include <omp.h> double sample_interval(double a, double b) { double x = ((double) rand())/(

Is it much faster to re-initialize a vector using OpenMP threads?

阅读更多关于 Is it much faster to re-initialize a vector using OpenMP threads?

问题 I'm using OpenMP libraries for parallel computing. I use C++ vectors, whose size is usually in the order of 1*10^5. While going through iteration process, I need to re-initialize a bunch of these large vectors(not thread private but global scope) to a initial value. which is the faster way to do this?, using #pragma omp for or #pragma omp single? 回答1: The general answer would need to be "it depends, you have to measure" since initialization in C++ can be, depending on the type, trivial or

find_first of a vector in parallel in C++

阅读更多关于 find_first of a vector in parallel in C++

问题 I have a quite big vector. Some of the vector members are matching a certain condition in parallel. I would like to find the first element matching to the condition. My problem is very similar to this question (tbb: parallel find first element) , but I do not have tbb. Checking condition is very tedious (so I cannot do it for all of them sequentially). That's why I would like to run it in parallel. I have to mention that I would like to find the first element (so the index position of the

OpenMP - Why does the number of comparisons decrease?

阅读更多关于 OpenMP - Why does the number of comparisons decrease?

问题 I have the following algorithm: int hostMatch(long *comparisons) { int i = -1; int lastI = textLength-patternLength; *comparisons=0; #pragma omp parallel for schedule(static, 1) num_threads(1) for (int k = 0; k <= lastI; k++) { int j; for (j = 0; j < patternLength; j++) { (*comparisons)++; if (textData[k+j] != patternData[j]) { j = patternLength+1; //break } } if (j == patternLength && k > i) i = k; } return i; } When changing num_threads I get the following results for number of comparisons:

Multiplatform multiprocessing?

阅读更多关于 Multiplatform multiprocessing?

问题 I was wondering why in the new C++11 they added threads and not processes. Couldn't have they done a wrapper around platform specific functions? Any suggestion about the most portable way to do multiprocessing? fork()? OpenMP? 回答1: If you could use Qt, QProcess class could be an elegant platform independent solution. 回答2: If you want to do this portably I'd suggest you avoid calling fork() directly and instead write your own library function that can be mapped on to a combination of fork()

OpenMP parallelisation of pi calculation is either slow or wrong

阅读更多关于 OpenMP parallelisation of pi calculation is either slow or wrong

问题 I'm having trouble parallelising my monte carlo method to calculate pi. Here is the parallelised for-loop: #pragma omp parallel for private(i,x,y) schedule(static) reduction(+:count) for (i = 0; i < points; i++) { x = rand()/(RAND_MAX+1.0)*2 - 1.0; y = rand()/(RAND_MAX+1.0)*2 - 1.0; // Check if point lies in circle if(x*x + y*y < 1.0) { count++; } } The problem is, it underestimates pi if I use schedule(static) , and its slower than the serial implementation if I use schedule(dynamic) . What

OpenMP parallelisation of pi calculation is either slow or wrong

阅读更多关于 OpenMP parallelisation of pi calculation is either slow or wrong