openmp

Why OpenMP program runs only in one thread

≯℡__Kan透↙ 提交于 2019-12-10 16:44:58
问题 I just tried OpenMP with a simple c program test() { for(int i=0;i<100000000;i++); } main() { printf("Num of CPU: %d\n", omp_get_num_procs()); #pragma omp parallel for num_threads(4) for(int i=0;i<100;i++) test(); } Compiled with g++ -fopenmp . It can correctly print out that I have 4 CPUs, but all test functions are running at thread 0 . I tried to modify the OMP_NUM_THREADS . But it has no effect also. I had everything the same as the online examples but why wouldn't I get it to work? 回答1:

Matrix multiplication, KIJ order, Parallel version slower than non-parallel

送分小仙女□ 提交于 2019-12-10 16:13:21
问题 I have a school task about paralel programming and I'm having a lot of problems with it. My task is to create a parallel version of given matrix multiplication code and test its performence (and yes, it has to be in KIJ order): void multiply_matrices_KIJ() { for (int k = 0; k < SIZE; k++) for (int i = 0; i < SIZE; i++) for (int j = 0; j < SIZE; j++) matrix_r[i][j] += matrix_a[i][k] * matrix_b[k][j]; } This is what I came up with so far: void multiply_matrices_KIJ() { for (int k = 0; k < SIZE;

Does `std::mutex` and `std::lock` guarantee memory synchronisation in inter-processor code?

牧云@^-^@ 提交于 2019-12-10 14:48:33
问题 I'm currently using openMP to write code running on multi-core nodes. openMP has a specific memory model which guarantees that memory is synchronised between threads running on different cores when a lock is acquired. I consider using C++11 constructs ( std::thread with std::mutex and std::lock ) instead of openMP (because of their larger flexibility) and wonder if/how memory synchronisation between processors is guaranteed here? And if not, how can I enforce it? 回答1: The standard makes the

OpenMp set number of threads for parallel loop depending on variable

a 夏天 提交于 2019-12-10 14:43:48
问题 Is there a way to set number of threads in OpenMP parallel for region based on the value of a variable? Initially for the whole application number of threads = nofCores. On my AMD FX 8350, nofCores =8. For this area if the variable is 3 then I only need 3 threads. If variable>cores then number of threads should remain equal to nofCores. I do not want to set the number of threads globally for all the application. Just for this specific parallel loop. Sorry if this is a naive question, but I am

C OpenMP parallel bubble sort

﹥>﹥吖頭↗ 提交于 2019-12-10 14:36:46
问题 I have an implementation of parallel bubble sort algorithm(Odd-Even transposition sort) in C, using OpenMP. However, after I tested it it's slower than the serial version(by about 10%) although I have a 4 cores processor ( 2 real x 2 because of Intel hyperthreading). I have checked to see if the cores are actually used and I can see them at 100% each when running the program. Therefore I think I did a mistake in the implementation the algorithm. I am using linux with kernel 2.6.38-8-generic.

Why may thread_local not be applied to non-static data members and how to implement thread-local non-static data members?

有些话、适合烂在心里 提交于 2019-12-10 14:36:34
问题 Why may thread_local not be applied to non-static data members? The accepted answer to this question says: "There is no point in making non-static structure or class members thread-local." Honestly, I see many good reasons to make non-static data members thread-local. Assume we have some kind of ComputeEngine with a member function computeSomething that is called many times in succession. Some of the work inside the member function can be done in parallel. To do so, each thread needs some

C OMP omp_get_wtime() returning time 0.00

那年仲夏 提交于 2019-12-10 13:58:20
问题 I have used a omp_get_wtime() but when i want to print the time i always get 0.00, where is the problem ? #define SIZE 500 #define nthreads 10 (...) void sumTab(int mX[][SIZE], int mY[][SIZE], int mZ[][SIZE]) { int i,k; double start = omp_get_wtime(); #pragma omp parallel for schedule(dynamic,3) private(i) num_threads(nthreads) for(i=0 ; i<SIZE ; i++) { for(k=0 ; k<SIZE ; k++) { mZ[i][k]=mX[i][k]+mY[i][k]; printf("Thread no %d \t [%d] [%d] result: %d\n", omp_get_thread_num(),i,k, mZ[i][k]); }

Is it possible to make thread join to 'parallel for' region after its job?

旧街凉风 提交于 2019-12-10 13:34:10
问题 I have two jobs that need to run simultaneously at first: 1) for loop that can be parallelized 2) function that can be done with one thread Now, let me describe what I want to do. If there exist 8 available threads, job(1) and job(2) have to run simultaneously at first with 7 threads and 1 thread, respectively. After job(2) finishes, the thread that job(2) was using should be allocated to job(1) which is the parallel for loop. I'm using omp_get_thread_num to count how many threads are active

Is there an implicit Barrier after omp critical section

瘦欲@ 提交于 2019-12-10 12:45:54
问题 Is there an implicit omp barrier after omp critical section For example, Can I modify this following code version-1 into version-2. VERSION-1 int min = 100; #pragma omp parallel { int localmin = min; #pragma omp for schedule(static) for(int i = 0; i < 1000; i++) localmin = std::min(localmin, arr[i]); #pragma omp critical { min = std::min(localmin, min) } } VERSION-2 int min = 100; #pragma omp parallel { int localmin = min; #pragma omp for schedule(static) nowait for(int i = 0; i < 1000; i++)

How to set openmp thread stack to unlimited?

寵の児 提交于 2019-12-10 12:19:38
问题 Can someone tell me how to set OpenMP stack size to unlimited? Like this link: Why Segmentation fault is happening in this openmp code? I also have a project written by Fortran (customer‘s complex code), if I set OMP_STACKSIZE , the project is running normally. If I unset it, the project fails. But, different input data have different OMP_STACKSIZE , so I must try it for each inputdata, (because I must save memory). Can I set the OpenMP stack like pthread ( ulimit -s unlimited )? Or have some