I\'ve written this code which reads a Matrix and it basically sums the values of the matrix... But my question would be, since I\'ve tried doing the pragma in different ways
The code you posted is not correct without the reduction clause.
sum += matrixA[i][j];
Will cause a classic race condition when executed by multiple threads in parallel. Sum is a shared variable, but sum += ... is not an atomic operation.
(sum is initially 0, all matrix elements 1)
Thread 1 | Thread 2
-----------------------------------------------------------
tmp = sum + matrix[0][0] = 1 |
| tmp = sum + matrix[1][0] = 1
sum = tmp = 1 |
| sum = tmp = 1 (instead of 2)
The reduction fixes exactly this. With reduction, the loop will work on an implicit thread-local copy of the sum variable. At the end of the region, the original sum variable will be set to the sum of all thread-local copies (in a correct way without race-conditions).
Another solution would be to mark the sum += ... as atomic operation or critical section. That, however has a significant performance penalty.