MPI_Gather of columns

后端 未结 1 1170
天命终不由人
天命终不由人 2021-01-16 21:56

I have an array which is split up by columns between the processes for my calculation. Afterwards I want to gather this array in one process (0).

Each process has i

相关标签:
1条回答
  • 2021-01-16 22:26

    The problem that I see is that the datatype created with MPI_Type_vector() has extent going from the first to the last item. For example:

    The extent for your col_recv datatype is between > and < (I hope this representation of the mask is clear enough):

    >x . . .
     x . . .
     x . . .
     x<. . .
    

    That is 13 MPI_FLOAT items (must be read by row, that's C ordering). receiving two of them will lead to:

    >x . . .
     x . . .
     x . . .
     x y . .
     . y . .
     . y . .
     . y . .
    

    That clearly is not what you want.

    To let the MPI_Gather() properly skip data on the receiver you need to set the extent of col_recv as large as exactly ONE ELEMENT. You can do this by using MPI_Type_create_resized():

    >x<. . .
     x . . .
     x . . .
     x . . .
    

    so that receiving successive blocks gets correctly interleaved:

       x y . . 
       x y . . 
       x y . . 
       x y . . 
    

    However receiving two columns instead of one will lead to:

       x x y y
       x x y y
       x x y y
       x x y y
    

    That again is not what you want, even if closer.

    Since you want interleaved columns, you need to create a more complex datatype, capable of describing all the columns, with 1-item-extent as before:

    Each column is separated (stride) as one ELEMENT (that is the extent - not the size, that is 4 elements - of the previously defined column):

      >x<. x .
       x . x .
       x . x .
       x . x .
    

    receiving one of them per processor you'll get what you want:

       x y x y
       x y x y
       x y x y
       x y x y
    

    You can do it with MPI_Type_create_darray() as well, since it allow to create datatypes suitable to be used with the block-cyclic distribution of scalapack, being your one a 1D subcase of it.

    I have also tried it. Here is a working code, on two processors:

    #include <mpi.h>
    
    #define N      4
    #define NPROCS 2
    #define NPART  (N/NPROCS)
    
    int main(int argc, char **argv) {
      float a_send[N][NPART];
      float a_recv[N][N] = {0};
      MPI_Datatype column_send_type;
      MPI_Datatype column_recv_type;
      MPI_Datatype column_send_type1;
      MPI_Datatype column_recv_type1;
      MPI_Datatype matrix_columns_type;
      MPI_Datatype matrix_columns_type1;
    
      MPI_Init(&argc, &argv);
      int my_rank;
      MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    
      for(int i=0; i<N; ++i) {
        for(int j=0; j<NPART; ++j) {
          a_send[i][j] = my_rank*100+10*(i+1)+(j+1);
        }
      }
    
      MPI_Type_vector(N, 1, NPART, MPI_FLOAT, &column_send_type);
      MPI_Type_commit(&column_send_type);
    
      MPI_Type_create_resized(column_send_type, 0, sizeof(float), &column_send_type1);
      MPI_Type_commit(&column_send_type1);
    
      MPI_Type_vector(N, 1, N, MPI_FLOAT, &column_recv_type);
      MPI_Type_commit(&column_recv_type);
    
      MPI_Type_create_resized(column_recv_type, 0, sizeof(float), &column_recv_type1);
      MPI_Type_commit(&column_recv_type1);
    
      MPI_Type_vector(NPART, 1, NPROCS, column_recv_type1, &matrix_columns_type);
      MPI_Type_commit(&matrix_columns_type);
    
      MPI_Type_create_resized(matrix_columns_type, 0, sizeof(float), &matrix_columns_type1);
      MPI_Type_commit(&matrix_columns_type1);
    
      MPI_Gather(a_send, NPART, column_send_type1, a_recv, 1, matrix_columns_type1, 0, MPI_COMM_WORLD);
    
      if (my_rank==0) {
        for(int i=0; i<N; ++i) {
          for(int j=0; j<N; ++j) {
            printf("%4.0f  ",a_recv[i][j]);
          }
          printf("\n");
        }
      }
    
      MPI_Finalize();
    }
    
    0 讨论(0)
提交回复
热议问题