I have an array which is split up by columns between the processes for my calculation. Afterwards I want to gather this array in one process (0).
Each process has i
The problem that I see is that the datatype created with MPI_Type_vector()
has extent going from the first to the last item. For example:
The extent for your col_recv
datatype is between >
and <
(I hope this representation of the mask is clear enough):
>x . . .
x . . .
x . . .
x<. . .
That is 13 MPI_FLOAT items (must be read by row, that's C ordering). receiving two of them will lead to:
>x . . .
x . . .
x . . .
x y . .
. y . .
. y . .
. y . .
That clearly is not what you want.
To let the MPI_Gather()
properly skip data on the receiver you need to set the extent of col_recv
as large as exactly ONE ELEMENT. You can do this by using MPI_Type_create_resized()
:
>x<. . .
x . . .
x . . .
x . . .
so that receiving successive blocks gets correctly interleaved:
x y . .
x y . .
x y . .
x y . .
However receiving two columns instead of one will lead to:
x x y y
x x y y
x x y y
x x y y
That again is not what you want, even if closer.
Since you want interleaved columns, you need to create a more complex datatype, capable of describing all the columns, with 1-item-extent as before:
Each column is separated (stride) as one ELEMENT (that is the extent - not the size, that is 4 elements - of the previously defined column):
>x<. x .
x . x .
x . x .
x . x .
receiving one of them per processor you'll get what you want:
x y x y
x y x y
x y x y
x y x y
You can do it with MPI_Type_create_darray()
as well, since it allow to create datatypes suitable to be used with the block-cyclic distribution of scalapack, being your one a 1D subcase of it.
I have also tried it. Here is a working code, on two processors:
#include <mpi.h>
#define N 4
#define NPROCS 2
#define NPART (N/NPROCS)
int main(int argc, char **argv) {
float a_send[N][NPART];
float a_recv[N][N] = {0};
MPI_Datatype column_send_type;
MPI_Datatype column_recv_type;
MPI_Datatype column_send_type1;
MPI_Datatype column_recv_type1;
MPI_Datatype matrix_columns_type;
MPI_Datatype matrix_columns_type1;
MPI_Init(&argc, &argv);
int my_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
for(int i=0; i<N; ++i) {
for(int j=0; j<NPART; ++j) {
a_send[i][j] = my_rank*100+10*(i+1)+(j+1);
}
}
MPI_Type_vector(N, 1, NPART, MPI_FLOAT, &column_send_type);
MPI_Type_commit(&column_send_type);
MPI_Type_create_resized(column_send_type, 0, sizeof(float), &column_send_type1);
MPI_Type_commit(&column_send_type1);
MPI_Type_vector(N, 1, N, MPI_FLOAT, &column_recv_type);
MPI_Type_commit(&column_recv_type);
MPI_Type_create_resized(column_recv_type, 0, sizeof(float), &column_recv_type1);
MPI_Type_commit(&column_recv_type1);
MPI_Type_vector(NPART, 1, NPROCS, column_recv_type1, &matrix_columns_type);
MPI_Type_commit(&matrix_columns_type);
MPI_Type_create_resized(matrix_columns_type, 0, sizeof(float), &matrix_columns_type1);
MPI_Type_commit(&matrix_columns_type1);
MPI_Gather(a_send, NPART, column_send_type1, a_recv, 1, matrix_columns_type1, 0, MPI_COMM_WORLD);
if (my_rank==0) {
for(int i=0; i<N; ++i) {
for(int j=0; j<N; ++j) {
printf("%4.0f ",a_recv[i][j]);
}
printf("\n");
}
}
MPI_Finalize();
}