openmp | 易学教程

How a recent version of GCC (4.6) could be used together with Qt under Mac OS?

阅读更多关于 How a recent version of GCC (4.6) could be used together with Qt under Mac OS?

My problem is related to the one discussed here: Is there a way that OpenMP can operate on Qt spanwed threads? Upon trying to run my Qt-based program under Mac OS that has an OpenMP clause in a secondary thread, it crashed. After browsing through the web, now I understand that it is caused by a bug in the rather old version (4.2) of gcc supplied by Apple. Then I downloaded the latest 4.6 version of gcc from http://hpc.sourceforge.net and tried to compile the project, but I got the following errors from g++ compiler: unrecognized option ‘-arch’ unrecognized option ‘-Xarch_x86_64’ I learned that

openMP conditional pragma “if else”

阅读更多关于 openMP conditional pragma “if else”

I have a for loop that can be executed using schedule(static) or schedule(dynamic, 10) depending on a condition. Currently, My code is not DRY (Don't repeat yourself) enough and to accommodate the previous functionality it has the following repetition: boolean isDynamic; //can be true or false if(isDynamic){ #pragma omp parallel for num_threads(thread_count) default(shared) private(...) schedule(dynamic, 10) for(...){ //for code inside } }else{ #pragma omp parallel for num_threads(thread_count) default(shared) private(...) schedule(static) for(...){ //SAME for code inside, in fact, this is the

Multi-dimensional nested OpenMP loop

阅读更多关于 Multi-dimensional nested OpenMP loop

What is the proper way to parallelize a multi-dimensional embarrassingly parallel loop in OpenMP? The number of dimensions is known at compile-time, but which dimensions will be large is not. Any of them may be one, two, or a million. Surely I don't want N omp parallel 's for an N-dimensional loop... Thoughts: The problem is conceptually simple. Only the outermost 'large' loop needs to be parallelized, but the loop dimensions are unknown at compile-time and may change. Will dynamically setting omp_set_num_threads(1) and #pragma omp for schedule(static, huge_number) make certain loop

OpenMP: Get total number of running threads

阅读更多关于 OpenMP: Get total number of running threads

I need to know the total number of threads that my application has spawned via OpenMP. Unfortunately, the omp_get_num_threads() function does not work here since it only yields the number of threads in the current team. However, my code runs recursively (divide and conquer, basically) and I want to spawn new threads as long as there are still idle processors, but no more. Is there a way to get around the limitations of omp_get_num_threads and get the total number of running threads? If more detail is required, consider the following pseudo-code that models my workflow quite closely: function

Terrible performance - a simple issue of overhead, or is there a program flaw?

阅读更多关于 Terrible performance - a simple issue of overhead, or is there a program flaw?

问题 I have here what I understand to be a relatively simple OpenMP construct. The issue is that the program runs about 100-300x faster with 1 thread when compared to 2 threads. 87% of the program is spent in gomp_send_wait() and another 9.5% in gomp_send_post . The program gives correct results, but I wonder if there is a flaw in the code that is causing some resource conflict, or if it is simply that the overhead of the thread creation is drastically not worth it for a a loop of chunk size 4. p

How to turn on OpenMP when using Qt creator

阅读更多关于 How to turn on OpenMP when using Qt creator

问题 If I am building the project from Qt creator using VS 2010 compiler, how do i enable OpenMP (when building from visual studio you just enable the feature) Thanks 回答1: Try next in your .pro file in case msvc2010 QMAKE_CXXFLAGS+= -openmp QMAKE_LFLAGS += -openmp or QMAKE_CXXFLAGS+= -fopenmp QMAKE_LFLAGS += -fopenmp in case gcc 来源： https://stackoverflow.com/questions/9815293/how-to-turn-on-openmp-when-using-qt-creator

Thread safety of std::random_device

阅读更多关于 Thread safety of std::random_device

I have some code which looks a bit like this: std::random_device rd; #pragma omp parallel { std::mt19937 gen(rd()); #pragma omp for for(int i=0; i < N; i++) { /* Do stuff with random numbers from gen() */ } } I have a few questions: Is std::random_device thread safe? i.e. Is it going to do something unhelpful when several threads call it at once? Is this generally a good idea? Should I be worried about overlapping random number streams? Is there a better way to achieve what I want (independent random number streams in each thread - I'm not too worried about reproducibility at the moment)? In

Fetch-and-add using OpenMP atomic operations

阅读更多关于 Fetch-and-add using OpenMP atomic operations

问题 I’m using OpenMP and need to use the fetch-and-add operation. However, OpenMP doesn’t provide an appropriate directive/call. I’d like to preserve maximum portability, hence I don’t want to rely on compiler intrinsics. Rather, I’m searching for a way to harness OpenMP’s atomic operations to implement this but I’ve hit a dead end. Can this even be done? N.B., the following code almost does what I want: #pragma omp atomic x += a Almost – but not quite, since I really need the old value of x .

OpenMP with MSVC 2010 Debug build strange bug when object are copied

阅读更多关于 OpenMP with MSVC 2010 Debug build strange bug when object are copied

问题 I have a fairly complex program that runs into strange behavior when build with OpenMP in MSVC 2010 Debug mode. I have tried my best to construct the following minimal working example (though it is not really minimal) which minic the structure of the real program. #include <vector> #include <cassert> // A class take points to the whole collection and a position Only allow access // to the elements at that posiiton. It provide read-only access to query some // information about the whole

Efficient parallelisation of a linear algebraic function in C++ OpenMP

阅读更多关于 Efficient parallelisation of a linear algebraic function in C++ OpenMP

I have little experience with parallel programming and was wondering if anyone could have a quick glance at a bit of code I've written and see, if there are any obvious ways I can improve the efficiency of the computation. The difficulty arises due to the fact that I have multiple matrix operations of unequal dimensionality that I need to compute, so I'm not sure the most condensed way of coding the computation. Below is my code. Note this code DOES work. The matrices I am working with are of dimension approx 700x700 [see int s below] or 700x30 [int n]. Also, I am using the armadillo library