openmp

OpenMP: conditional use of #pragma

╄→гoц情女王★ 提交于 2019-12-18 19:15:26
问题 I'm using OpenMP to improve my program efficiency on loops. But recently I discovered that on small loops the use of this library decreased performances and that using the normal way was better. In fact, I'd like to use openMP only if a condition is satisfied, my code is #pragma omp parallel for for (unsigned i = 0; i < size; ++i) do_some_stuff (); But what I want to do is to disable the #pragma if size is small enough i.e.: if (size > OMP_MIN_VALUE) #pragma omp parallel for for (unsigned i =

Hybrid MPI/OpenMP in LSF

回眸只為那壹抹淺笑 提交于 2019-12-18 17:29:58
问题 I am moving a program parallelized by OpenMP to Cluster. The cluster is using Lava 1.0 as scheduler and has 8 cores in each nodes. I used a MPI wrapper in the job script to do multi-host parallel. Here is the job script: #BSUB -q queue_name #BSUB -x #BSUB -R "span[ptile=1]" #BSUB -n 1 #BSUB -J n1p1o8 ##BSUB -o outfile.email #BSUB -e err export OMP_NUM_THREADS=8 date /home/apps/bin/lava.openmpi.wrapper -bynode -x OMP_NUM_THREADS \ ~/my_program ~/input.dat ~/output.out date I did some

What limits scaling in this simple OpenMP program?

元气小坏坏 提交于 2019-12-18 17:01:11
问题 I'm trying to understand limits to parallelization on a 48-core system (4xAMD Opteron 6348, 2.8 Ghz, 12 cores per CPU). I wrote this tiny OpenMP code to test the speedup in what I thought would be the best possible situation (the task is embarrassingly parallel): // Compile with: gcc scaling.c -std=c99 -fopenmp -O3 #include <stdio.h> #include <stdint.h> int main(){ const uint64_t umin=1; const uint64_t umax=10000000000LL; double sum=0.; #pragma omp parallel for reduction(+:sum) for(uint64_t u

Splitting up a program into 4 threads is slower than a single thread

不问归期 提交于 2019-12-18 16:52:32
问题 I've been writing a raytracer the past week, and have come to a point where it's doing enough that multi-threading would make sense. I have tried using OpenMP to parallelize it, but running it with more threads is actually slower than running it with one. Reading over other similar questions, especially about OpenMP, one suggestion was that gcc optimizes serial code better. However running the compiled code below with export OMP_NUM_THREADS=1 is twice as fast as with export OMP_NUM_THREADS=4

Task Dependency in OpenMP 4

∥☆過路亽.° 提交于 2019-12-18 13:37:52
问题 The following code works based on the OpenMP 4.0 specification: The out and inout dependence-types. The generated task will be a dependent task of all previously generated sibling tasks that reference at least one of the list items in an in, out, or inout dependence-type list. This means that task3 becomes dependent of task2. Right? but it does not make sense! Why should an input-output dependency task be a dependent of an input dependency task? What do I need to do in order to make them

C OpenMP parallel quickSort

时光总嘲笑我的痴心妄想 提交于 2019-12-18 13:22:18
问题 Once again I'm stuck when using openMP in C++. This time I'm trying to implement a parallel quicksort. Code: #include <iostream> #include <vector> #include <stack> #include <utility> #include <omp.h> #include <stdio.h> #define SWITCH_LIMIT 1000 using namespace std; template <typename T> void insertionSort(std::vector<T> &v, int q, int r) { int key, i; for(int j = q + 1; j <= r; ++j) { key = v[j]; i = j - 1; while( i >= q && v[i] > key ) { v[i+1] = v[i]; --i; } v[i+1] = key; } } stack<pair<int

What does gcc without multilib mean?

本小妞迷上赌 提交于 2019-12-18 12:49:38
问题 I was trying to use the omh.h header file and I realized it was missing. I tried reinstalling gcc on my mac using brew. This is the message I got at the end of the installation. .. GCC has been built with multilib support. Notably, OpenMP may not work: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60670 If you need OpenMP support you may want to brew reinstall gcc --without-multilib ==> Summary 🍺 /usr/local/Cellar/gcc/4.9.2_1: 1156 files, 203M It suggests that if I need OpenMP support I need

Difference between OpenMP threadprivate and private

一个人想着一个人 提交于 2019-12-18 12:15:39
问题 I am trying to parallelize a C program using OpenMP. I would like to know more about: The differences between the threadprivate directive and the private clause and In which cases we must use any of them. As far as I know, the difference is the global scope with threadprivate and the preserved value across parallel regions. I found in several examples that when a piece of code contains some global/static variables that must be privatized, these variables are included in a threadprivate list

OpenMP and MPI hybrid program

末鹿安然 提交于 2019-12-18 11:57:42
问题 I have a machine with 8 processors. I want to alternate using OpenMP and MPI on my code like this: OpenMP phase: ranks 1-7 wait on a MPI_Barrier rank 0 uses all 8 processors with OpenMP MPI phase: rank 0 reaches barrier and all ranks use one processor each So far, I've done: set I_MPI_WAIT_MODE 1 so that ranks 1-7 don't use the CPU while on the barrier. set omp_set_num_threads(8) on rank 0 so that it launches 8 OpenMP threads. It all worked. Rank 0 did launch 8 threads, but all are confined

OpenMP and MPI hybrid program

让人想犯罪 __ 提交于 2019-12-18 11:57:10
问题 I have a machine with 8 processors. I want to alternate using OpenMP and MPI on my code like this: OpenMP phase: ranks 1-7 wait on a MPI_Barrier rank 0 uses all 8 processors with OpenMP MPI phase: rank 0 reaches barrier and all ranks use one processor each So far, I've done: set I_MPI_WAIT_MODE 1 so that ranks 1-7 don't use the CPU while on the barrier. set omp_set_num_threads(8) on rank 0 so that it launches 8 OpenMP threads. It all worked. Rank 0 did launch 8 threads, but all are confined