MPI_Gather of columns

后端未结

关注

 1  1177

I have an array which is split up by columns between the processes for my calculation. Afterwards I want to gather this array in one process (0).

Each process has i

相关标签:

1条回答

名媛妹妹

2021-01-16 22:26

The problem that I see is that the datatype created with MPI_Type_vector() has extent going from the first to the last item. For example:

The extent for your col_recv datatype is between > and < (I hope this representation of the mask is clear enough):

>x . . .
 x . . .
 x . . .
 x<. . .

That is 13 MPI_FLOAT items (must be read by row, that's C ordering). receiving two of them will lead to:

>x . . .
 x . . .
 x . . .
 x y . .
 . y . .
 . y . .
 . y . .

That clearly is not what you want.

To let the MPI_Gather() properly skip data on the receiver you need to set the extent of col_recv as large as exactly ONE ELEMENT. You can do this by using MPI_Type_create_resized():

>x<. . .
 x . . .
 x . . .
 x . . .

so that receiving successive blocks gets correctly interleaved:

   x y . . 
   x y . . 
   x y . . 
   x y . .

However receiving two columns instead of one will lead to:

   x x y y
   x x y y
   x x y y
   x x y y

That again is not what you want, even if closer.

Since you want interleaved columns, you need to create a more complex datatype, capable of describing all the columns, with 1-item-extent as before:

Each column is separated (stride) as one ELEMENT (that is the extent - not the size, that is 4 elements - of the previously defined column):

  >x<. x .
   x . x .
   x . x .
   x . x .

receiving one of them per processor you'll get what you want:

   x y x y
   x y x y
   x y x y
   x y x y

You can do it with MPI_Type_create_darray() as well, since it allow to create datatypes suitable to be used with the block-cyclic distribution of scalapack, being your one a 1D subcase of it.

I have also tried it. Here is a working code, on two processors:

#include <mpi.h>

#define N      4
#define NPROCS 2
#define NPART  (N/NPROCS)

int main(int argc, char **argv) {
  float a_send[N][NPART];
  float a_recv[N][N] = {0};
  MPI_Datatype column_send_type;
  MPI_Datatype column_recv_type;
  MPI_Datatype column_send_type1;
  MPI_Datatype column_recv_type1;
  MPI_Datatype matrix_columns_type;
  MPI_Datatype matrix_columns_type1;

  MPI_Init(&argc, &argv);
  int my_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  for(int i=0; i<N; ++i) {
    for(int j=0; j<NPART; ++j) {
      a_send[i][j] = my_rank*100+10*(i+1)+(j+1);
    }
  }

  MPI_Type_vector(N, 1, NPART, MPI_FLOAT, &column_send_type);
  MPI_Type_commit(&column_send_type);

  MPI_Type_create_resized(column_send_type, 0, sizeof(float), &column_send_type1);
  MPI_Type_commit(&column_send_type1);

  MPI_Type_vector(N, 1, N, MPI_FLOAT, &column_recv_type);
  MPI_Type_commit(&column_recv_type);

  MPI_Type_create_resized(column_recv_type, 0, sizeof(float), &column_recv_type1);
  MPI_Type_commit(&column_recv_type1);

  MPI_Type_vector(NPART, 1, NPROCS, column_recv_type1, &matrix_columns_type);
  MPI_Type_commit(&matrix_columns_type);

  MPI_Type_create_resized(matrix_columns_type, 0, sizeof(float), &matrix_columns_type1);
  MPI_Type_commit(&matrix_columns_type1);

  MPI_Gather(a_send, NPART, column_send_type1, a_recv, 1, matrix_columns_type1, 0, MPI_COMM_WORLD);

  if (my_rank==0) {
    for(int i=0; i<N; ++i) {
      for(int j=0; j<N; ++j) {
        printf("%4.0f  ",a_recv[i][j]);
      }
      printf("\n");
    }
  }

  MPI_Finalize();
}

0 讨论(0)