Reduce OpenMP fork/join overhead by separating #omp parallel and #omp for
问题 I'm reading the book An introduction to parallel programming by Peter S. Pacheco. In Section 5.6.2, it gave an interesting discussion about reducing the fork/join overhead. Consider the odd-even transposition sort algorithm: for(phase=0; phase < n; phase++){ if(phase is even){ # pragma omp parallel for default(none) shared(n) private(i) for(i=1; i<n; i+=2){//meat} } else{ # pragma omp parallel for default(none) shared(n) private(i) for(i=1; i<n-1; i+=2){//meat} } } The author argues that the