I am trying to implement the LU factorization of the vandermonde matrix with OpenMPI. Therefore, the matrix is split onto the processors in a cycle-wise manner, e.g. process