openmp | 易学教程

Red-Black Gauss Seidel and OpenMP

阅读更多关于 Red-Black Gauss Seidel and OpenMP

问题 I was trying to prove a point with OpenMP compared to MPICH, and I cooked up the following example to demonstrate how easy it was to do some high performance in OpenMP. The Gauss-Seidel iteration is split into two separate runs, such that in each sweep every operation can be performed in any order, and there should be no dependency between each task. So in theory each processor should never have to wait for another process to perform any kind of synchronization. The problem I am encountering,

omp max reduction with storage of index

阅读更多关于 omp max reduction with storage of index

问题 Using c++ openmp 3.1 I implemented a max reduction which stores the maximum value of integer variable (score) of an vector of objects (s). But I also want to store the vector index to acces the (s) object with the maximum score. My current unsuccesfull implementation looks like this: //s is a vector of sol objects which contain apart from other variables an integer score variable s[].score int bestscore = 0; int bestant = 0; #pragma omp parallel shared(bestant) {//start parallel session

TBB concurrent_vector with openmp

阅读更多关于 TBB concurrent_vector with openmp

问题 Can we use TBB concurrent_vector with openmp? Will concurrent updates be allowed? 回答1: Yes, TBB's concurrent data structures are meant to be thread-safe, which means whatever threading paradigms, such as OpenMP, TBB, Cilk, PPL, and etc, are okay to use TBB's concurrent data structures. This is because concurrent_vector is simply a data structure class rather than threading-related control code. Furthermore, TBB's mutex can be also used within OpenMP, Cilk, and PPL. 回答2: Per Section 1.11 of

OpenGL with OpenMP always segfault

阅读更多关于 OpenGL with OpenMP always segfault

问题 I have in my program a loop that will fill an 3D cube with pixels (GL_POINTS), so to speed up things a little I thought i could use OpenMP and separate this for loop in my multi-core processor. The problem is that any time I use OpenMP in the loop the program segfaults, here is the code of the loop: glBegin(GL_POINTS); #pragma omp parallel for for (int a = 0; a < m_width * m_height; a++) { uint8_t r, g, b; r = m_data[a * m_channels]; g = m_data[a * m_channels + 1]; b = m_data[a * m_channels +

OpenMP custom reduction variable

阅读更多关于 OpenMP custom reduction variable

问题 I've been assigned to implement the idea of a reduction variable without using the reduction clause. I set up this basic code to test it. int i = 0; int n = 100000000; double sum = 0.0; double val = 0.0; for (int i = 0; i < n; ++i) { val += 1; } sum += val; so at the end sum == n . Each thread should set val as a private variable, and then the addition to sum should be a critical section where the threads converge, e.g. int i = 0; int n = 100000000; double sum = 0.0; double val = 0.0; #pragma

Using OpenMP critical and ordered

阅读更多关于 Using OpenMP critical and ordered

问题 I've quite new to Fortran and OpenMP, but I'm trying to get my bearings. I have a piece of code for calculating variograms which I'm attempting to parallelize. However, I seem to be getting race conditions, as some of the results are off by a thousandth or so. The problem seems to be the reductions. Using OpenMP reductions work and give the correct results, but they are not desirable, because the reductions actually happen in another subroutine (I copied the relevant lines into the OpenMP

Why do I get undefined behavior when using OpenMP's firstprivate with std::vector on Intel compiler?

阅读更多关于 Why do I get undefined behavior when using OpenMP's firstprivate with std::vector on Intel compiler?

问题 I have a problem when using OpenMP in combination with firstprivate and std::vector on the Intel c++ compiler. Take the following three functions: #include <omp.h> void pass_vector_by_value(std::vector<double> p) { #pragma omp parallel { //do sth } } void pass_vector_by_value_and_use_firstprivate(std::vector<double> p) { #pragma omp parallel firstprivate(p) { //do sth } } void create_vector_locally_and_use_firstprivate() { std::vector<double> p(3, 7); #pragma omp parallel firstprivate(p) { /

Suppress OpenMP debug messages when running Tensorflow on CPU

阅读更多关于 Suppress OpenMP debug messages when running Tensorflow on CPU

问题 When running a Python program on Linux that includes import tensorflow (installed without GPU support), a bunch of OpenMP debug messages are written to stdout, even when no functions from the tensorflow module are ever called. Here's an excerpt: OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3 OMP: Info #156: KMP_AFFINITY: 4 available OS procs OMP:

linking OpenMP statically with GCC

阅读更多关于 linking OpenMP statically with GCC

问题 Given the following file print.cpp #include <stdio.h> int main() { printf("asdf\n"); } I can link this statically like this g++ -static print.cpp or like this g++ -static-libgcc -Wl,-Bstatic -lc print.cpp -o print But now let's add a little OpenMP and call the file print_omp.cpp #include <omp.h> #include <stdio.h> int main() { printf("%d\n", omp_get_num_threads()); } I can link this statically like this (I checked it with ldd ) g++ -fopenmp -static print_omp.cpp However, this does not work g+

Choose OpenMP pragma according to condition

阅读更多关于 Choose OpenMP pragma according to condition

问题 I have a code that I want to optimise that should run in a variaty of threads ammount. After running some tests using different scheduling techniques in a for loop that I have, I came to the conclusion that what suits best is to perform a dynamic scheduling when I have only one thread and guided otherwise. Is that even possible in openMP? To be more precise I want to be able to do something like the following: if(omp_get_max_threads()>1) #pragma omp parallel for .... scheduling(guided) else