openmp | 易学教程

Atomic access to non-atomic memory location in C++11 and OpenMP?

阅读更多关于 Atomic access to non-atomic memory location in C++11 and OpenMP?

问题 OpenMP, in contrast to C++11, works with atomicity from a perspective of memory operations, not variables. That allows, e.g., to use atomic reads/writes for integers being stored in a vector with unknown size at compile time: std::vector<int> v; // non-atomic access (e.g., in a sequential region): v.resize(n); ... v.push_back(i); ... // atomic access in a multi-threaded region: #pragma omp atomic write // seq_cst v[k] = ...; #pragma omp atomic read // seq_cst ... = v[k]; In C++11, this is not

Dependency on VCOMP90.DLL in VS2008 Pro OpenMP project

阅读更多关于 Dependency on VCOMP90.DLL in VS2008 Pro OpenMP project

问题 I have a DLL project in VS 2008 Pro which uses OpenMP. I use /MT as 'code generation' option, because I want all my dependencies statically linked into my DLL, since I do not want to distribute many libraries to my clients - everything shall be included in this one DLL file. The problem is that my resulting DLL still depends on VCOMP90.DLL. How can I get rid of this dependency? Some information: /openmp is set in compiler options I statically link against vcomp.lib include is set using

How to parallelize reading lines from an input file when lines get independently processed?

阅读更多关于 How to parallelize reading lines from an input file when lines get independently processed?

问题 I just started off with OpenMP using C++. My serial code in C++ looks something like this: #include <iostream> #include <string> #include <sstream> #include <vector> #include <fstream> #include <stdlib.h> int main(int argc, char* argv[]) { string line; std::ifstream inputfile(argv[1]); if(inputfile.is_open()) { while(getline(inputfile, line)) { // Line gets processed and written into an output file } } } Because each line is pretty much independently processed, I was attempting to use OpenMP

OpenMP Several “shared”-directives?

阅读更多关于 OpenMP Several “shared”-directives?

问题 Hey there, I have a very long list of shared variables in OpenMP so I have to split lines in fortran and use the "&"-syntax to make sure the lines stick together! Something like that: !$OMP PARALLEL DEFAULT(private) SHARED(vars...., & more_vars..., & more_vars... & ) That gives me errors when compiling without OpenMP, since only the first like is recognized as a comment! The problem now is that I can't add a "!" in front of those lines with a "&" in front to support compiling without OpenMP:

MKL Performance on Intel Phi

阅读更多关于 MKL Performance on Intel Phi

问题 I have a routine that performs a few MKL calls on small matrices (50-100 x 1000 elements) to fit a model, which I then call for different models. In pseudo-code: double doModelFit(int model, ...) { ... while( !done ) { cblas_dgemm(...); cblas_dgemm(...); ... dgesv(...); ... } return result; } int main(int argc, char **argv) { ... c_start = 1; c_stop = nmodel; for(int c=c_start; c<c_stop; c++) { ... result = doModelFit(c, ...); ... } } Call the above version 1. Since the models are independent

How to implement argmax with OpenMP?

阅读更多关于 How to implement argmax with OpenMP?

I am trying to implement a argmax with OpenMP. If short, I have a function that computes a floating point value: double toOptimize(int val); I can get the integer maximizing the value with: double best = 0; #pragma omp parallel for reduction(max: best) for(int i = 2 ; i < MAX ; ++i) { double v = toOptimize(i); if(v > best) best = v; } Now, how can I get the value i corresponding to the maximum? Edit: I am trying this, but would like to make sure it is valid: double best_value = 0; int best_arg = 0; #pragma omp parallel { double local_best = 0; int ba = 0; #pragma omp for reduction(max: best

C++ OpenMP Fibonacci: 1 thread performs much faster than 4 threads

阅读更多关于 C++ OpenMP Fibonacci: 1 thread performs much faster than 4 threads

问题 I'm trying to understand why the following runs much faster on 1 thread than on 4 threads on OpenMP. The following code is actually based on a similar question: OpenMP recursive tasks but when trying to implement one of the suggested answers, I don't get the intended speedup, which suggests I've done something wrong (and not sure what it is). Do people get better speed when running the below on 4 threads than on 1 thread? I'm getting a 10 times slowdown when running on 4 cores (I should be

Use OpenMP to find minimum for sets in parallel, C++

阅读更多关于 Use OpenMP to find minimum for sets in parallel, C++

问题 I'm implementing Boruvka's algorithm in C++ to find minimum spanning tree for a graph. This algorithm finds a minimum-weight edge for each supervertex (a supervertex is a connected component, it is simply a vertex in the first iteration) and adds them into the MST. Once an edge is added, we update the connected components and repeat the find-min-edge, and merge-supervertices process, until all the vertices in the graph are in one connected component. Since find-min-edge for each supervertex

OpenMP causes heisenbug segfault

阅读更多关于 OpenMP causes heisenbug segfault

问题 I'm trying to parallelize a pretty massive for-loop in OpenMP. About 20% of the time it runs through fine, but the rest of the time it crashes with various segfaults such as; *** glibc detected *** ./execute: double free or corruption (!prev): <address> *** *** glibc detected *** ./execute: free(): invalid next size (fast): <address> *** [2] <PID> segmentation fault ./execute My general code structure is as follows; <declare and initialize shared variables here> #pragma omp parallel private

C++ OpenMP working really slow on matrix-vector product

阅读更多关于 C++ OpenMP working really slow on matrix-vector product

问题 So, I'm making matrix-vector product using openMP, but I've noticed it's working reallllly slow. After some times trying to figure out whats wrong I just deleted all code in parallel section and its still SLOW. What can be problem here? (n = 1000) Here is time results for 1, 2 and 4 cores. seq_method time = 0.001047194215062 parrallel_method (1) time = 0.001050273191140 seq - par = -0.000003078976079 seq/par = 0.997068404578433 parrallel_method (2) time = 0.001961992426004 seq - par = -0