openmp | 易学教程

Why OpenMP program runs only in one thread

阅读更多关于 Why OpenMP program runs only in one thread

问题 I just tried OpenMP with a simple c program test() { for(int i=0;i<100000000;i++); } main() { printf("Num of CPU: %d\n", omp_get_num_procs()); #pragma omp parallel for num_threads(4) for(int i=0;i<100;i++) test(); } Compiled with g++ -fopenmp . It can correctly print out that I have 4 CPUs, but all test functions are running at thread 0 . I tried to modify the OMP_NUM_THREADS . But it has no effect also. I had everything the same as the online examples but why wouldn't I get it to work? 回答1:

Matrix multiplication, KIJ order, Parallel version slower than non-parallel

阅读更多关于 Matrix multiplication, KIJ order, Parallel version slower than non-parallel

问题 I have a school task about paralel programming and I'm having a lot of problems with it. My task is to create a parallel version of given matrix multiplication code and test its performence (and yes, it has to be in KIJ order): void multiply_matrices_KIJ() { for (int k = 0; k < SIZE; k++) for (int i = 0; i < SIZE; i++) for (int j = 0; j < SIZE; j++) matrix_r[i][j] += matrix_a[i][k] * matrix_b[k][j]; } This is what I came up with so far: void multiply_matrices_KIJ() { for (int k = 0; k < SIZE;

Does `std::mutex` and `std::lock` guarantee memory synchronisation in inter-processor code?

阅读更多关于 Does `std::mutex` and `std::lock` guarantee memory synchronisation in inter-processor code?

问题 I'm currently using openMP to write code running on multi-core nodes. openMP has a specific memory model which guarantees that memory is synchronised between threads running on different cores when a lock is acquired. I consider using C++11 constructs ( std::thread with std::mutex and std::lock ) instead of openMP (because of their larger flexibility) and wonder if/how memory synchronisation between processors is guaranteed here? And if not, how can I enforce it? 回答1: The standard makes the

OpenMp set number of threads for parallel loop depending on variable

阅读更多关于 OpenMp set number of threads for parallel loop depending on variable

问题 Is there a way to set number of threads in OpenMP parallel for region based on the value of a variable? Initially for the whole application number of threads = nofCores. On my AMD FX 8350, nofCores =8. For this area if the variable is 3 then I only need 3 threads. If variable>cores then number of threads should remain equal to nofCores. I do not want to set the number of threads globally for all the application. Just for this specific parallel loop. Sorry if this is a naive question, but I am

C OpenMP parallel bubble sort

阅读更多关于 C OpenMP parallel bubble sort

问题 I have an implementation of parallel bubble sort algorithm(Odd-Even transposition sort) in C, using OpenMP. However, after I tested it it's slower than the serial version(by about 10%) although I have a 4 cores processor ( 2 real x 2 because of Intel hyperthreading). I have checked to see if the cores are actually used and I can see them at 100% each when running the program. Therefore I think I did a mistake in the implementation the algorithm. I am using linux with kernel 2.6.38-8-generic.

Why may thread_local not be applied to non-static data members and how to implement thread-local non-static data members?

阅读更多关于 Why may thread_local not be applied to non-static data members and how to implement thread-local non-static data members?

问题 Why may thread_local not be applied to non-static data members? The accepted answer to this question says: "There is no point in making non-static structure or class members thread-local." Honestly, I see many good reasons to make non-static data members thread-local. Assume we have some kind of ComputeEngine with a member function computeSomething that is called many times in succession. Some of the work inside the member function can be done in parallel. To do so, each thread needs some

C OMP omp_get_wtime() returning time 0.00

阅读更多关于 C OMP omp_get_wtime() returning time 0.00

问题 I have used a omp_get_wtime() but when i want to print the time i always get 0.00, where is the problem ? #define SIZE 500 #define nthreads 10 (...) void sumTab(int mX[][SIZE], int mY[][SIZE], int mZ[][SIZE]) { int i,k; double start = omp_get_wtime(); #pragma omp parallel for schedule(dynamic,3) private(i) num_threads(nthreads) for(i=0 ; i<SIZE ; i++) { for(k=0 ; k<SIZE ; k++) { mZ[i][k]=mX[i][k]+mY[i][k]; printf("Thread no %d \t [%d] [%d] result: %d\n", omp_get_thread_num(),i,k, mZ[i][k]); }

Is it possible to make thread join to 'parallel for' region after its job?

阅读更多关于 Is it possible to make thread join to 'parallel for' region after its job?

问题 I have two jobs that need to run simultaneously at first: 1) for loop that can be parallelized 2) function that can be done with one thread Now, let me describe what I want to do. If there exist 8 available threads, job(1) and job(2) have to run simultaneously at first with 7 threads and 1 thread, respectively. After job(2) finishes, the thread that job(2) was using should be allocated to job(1) which is the parallel for loop. I'm using omp_get_thread_num to count how many threads are active

Is there an implicit Barrier after omp critical section

阅读更多关于 Is there an implicit Barrier after omp critical section

问题 Is there an implicit omp barrier after omp critical section For example, Can I modify this following code version-1 into version-2. VERSION-1 int min = 100; #pragma omp parallel { int localmin = min; #pragma omp for schedule(static) for(int i = 0; i < 1000; i++) localmin = std::min(localmin, arr[i]); #pragma omp critical { min = std::min(localmin, min) } } VERSION-2 int min = 100; #pragma omp parallel { int localmin = min; #pragma omp for schedule(static) nowait for(int i = 0; i < 1000; i++)

How to set openmp thread stack to unlimited?

阅读更多关于 How to set openmp thread stack to unlimited?

问题 Can someone tell me how to set OpenMP stack size to unlimited? Like this link: Why Segmentation fault is happening in this openmp code? I also have a project written by Fortran (customer‘s complex code), if I set OMP_STACKSIZE , the project is running normally. If I unset it, the project fails. But, different input data have different OMP_STACKSIZE , so I must try it for each inputdata, (because I must save memory). Can I set the OpenMP stack like pthread ( ulimit -s unlimited )? Or have some