MPI Derived data type for a struct with flexible size

问题

I am trying to send/recv in C++ a data structure that looks like this:

/* PSEUDOCODE */
const int N = getN(); // not available at compile time
const int M = getM();
struct package{
    int    foo;
    double bar;
    /* I know array members do not work this way,
       this is pseudocode. */
    int    flop[N];
    double blep[M];
};

Since M and N are constant during runtime, I can do MPI_Type_create_struct() and the new datatype woule be good throughout.

My question is how to implement the data structure as described above.

std::vector<> won't work because it's not serial.

Flexible array members like [] or [0] are undefined behavior in c++, and it does not work for the two of M and N.

So instead I have to use malloc() :

class Package {
public:
  // in buffer[]: bar, blep[], foo, flop[]
  // in that order and one directly follows another.
  Package():
    buffer((double*) malloc((M + 1) * sizeof(double) + 
                            (N + 1) * sizeof(int))),
    bar(buffer), blep(buffer + 1),
    foo((int*) (blep + M)),
    flop(foo + 1) {}
  ~Package(){
    free(buffer);
  }

  // construct / free the derived datatype
  static void initialize(unsigned inN, unsigned inM) {
    N = inN;
    M = inM;
    MPI_Aint offsets[2] = {0, (int)(sizeof(double)) * (M + 1)};
    int      blocks[2]  = {M + 1, N + 1};
    MPI_Datatype types[2] = {MPI_DOUBLE, MPI_INT};
    MPI_Type_create_struct(2, blocks, offsets, types, &packageType);
    MPI_Type_commit(&packageType);
  }
  static void finalize() {
    MPI_Type_free(&packageType);
  }

  int send(int rank, int tag) {
    return MPI_Send(buffer, 1, packageType, 
                    rank, tag, MPI_COMM_WORLD);
  }
  int recv(int rank, int tag) {
    return MPI_Recv(buffer, 1, packageType, 
                    rank, tag, MPI_COMM_WORLD, 
                    MPI_STATUS_IGNORE);
  }
private:
  double * buffer;

  static int M;
  static int N;
  static MPI_Datatype packageType;
public:
  // interface variables
  double * const bar;
  double * const blep;
  int    * const foo;
  int    * const flop;
};

int Package::N = 0;
int Package::M = 0;
MPI_Datatype Package::packageType = MPI_CHAR;

I tested the above code and it seems to work properly, but I am not sure if I am doing something that is actually undefined behavior. Specifically:

Is it ok to use sizeof() for MPI_Type_create_struct()? Some examples I find use MPI_Type_get_extent(), and I have no idea what is the difference.
I am not sure if it is a good idea to store the new datatype in a static member. The examples I found instead have it passed around as an argument. Is there any specific reason to do that?
I am also confused if this method is portable. I hope that it should be as portable as struct based methods, but perhaps I am missing something?

回答1:

I am also confused if this method is portable. I hope that it should be as portable as struct based methods, but perhaps I am missing something?

1. Suppose that instead of double and int you have some types A and B. Then it can happen that an object of type B, for which you allocate space right after As, gets misaligned. On some architectures trying to access such an object (e.g., int at (4N + 2)-bytes boundary) will cause a Bus error. So in the general case you have to ensure correct padding before the first B object. When you use struct a compiler does it for you.

2. The way you access buffer is UB. Essentially you're doing this:

double* buffer = reinterpret_cast<double*>(malloc(...));
double* bar = buffer;
int* foo = reinterpret_cast<int*>(buffer + 1);

do_something(buffer);
double bar_value = *bar; // This is UB
int foo_value = *foo;    // This is UB, too

The problem here is that there are no objects of type double and int at *bar and *foo. You can create them using placement new:

char* buffer = reinterpret_cast<char*>(malloc(...));
double* bar = new(buffer) double;
int* foo = new(buffer + sizeof(double)) int;

Please refer to this question.

For arrays you can use std::uninitialized_default_construct that constructs objects in the given range.

I am not sure if it is a good idea to store the new datatype in a static member. The examples I found instead have it passed around as an argument. Is there any specific reason to do that?

If N and M are static, then it seems fine to make packageType also static. If you have only one type of Package with fixed N and M, you probably would want to avoid calling MPI_Type_create_struct each time you construct a Package to create essentially the same MPI data type.

But this design doesn't look good: one should call initialize() before first construction. Probably you can make a factory that would first create MPI data type and then would construct Package upon user request with something like Package make_package(). Then each factory could have its own non-static N and M.

来源：https://stackoverflow.com/questions/56434495/mpi-derived-data-type-for-a-struct-with-flexible-size

标签

c++

mpi

memory-alignment