MPI's Scatterv operation

最后都变了- 提交于 2020-01-17 03:08:27

问题


I'm not sure that I am correctly understanding what MPI_Scatterv is supposed to do. I have 79 items to scatter amounts a variable amount of nodes. However, when I use the MPI_Scatterv command I get ridiculous numbers (as if the array elements of my receiving buffer are uninitialized). Here is the relevant code snippet:

MPI_Init(&argc, &argv);
int id, procs;

MPI_Comm_rank(MPI_COMM_WORLD, &id);
MPI_Comm_size(MPI_COMM_WORLD, &procs);

//Assign each file a number and figure out how many files should be
//assigned to each node
int file_numbers[files.size()];
int send_counts[nodes] = {0}; 
int displacements[nodes] = {0};

for (int i = 0; i < files.size(); i++)
{
    file_numbers[i] = i;
    send_counts[i%nodes]++;
}   

//figure out the displacements
int sum = 0;
for (int i = 0; i < nodes; i++)
{
    displacements[i] = sum;
    sum += send_counts[i];
}   

//Create a receiving buffer
int *rec_buf = new int[79];

if (id == 0)
{
    MPI_Scatterv(&file_numbers, send_counts, displacements, MPI_INT, rec_buf, 79, MPI_INT, 0, MPI_COMM_WORLD);
}   

cout << "got here " << id << " checkpoint 1" << endl;
cout << id << ": " << rec_buf[0] << endl;
cout << "got here " << id << " checkpoint 2" << endl;

MPI_Barrier(MPI_COMM_WORLD); 

free(rec_buf);

MPI_Finalize();

When I run that code I receive this output:

got here 1 checkpoint 1
1: -1168572184
got here 1 checkpoint 2
got here 2 checkpoint 1
2: 804847848
got here 2 checkpoint 2
got here 3 checkpoint 1
3: 1364787432
got here 3 checkpoint 2
got here 4 checkpoint 1
4: 903413992
got here 4 checkpoint 2
got here 0 checkpoint 1
0: 0
got here 0 checkpoint 2

I read the documentation for OpenMPI and looked through some code examples, I'm not sure what I'm missing any help would be great!


回答1:


One of the most common MPI mistakes strikes again:

if (id == 0)    // <---- PROBLEM
{
    MPI_Scatterv(&file_numbers, send_counts, displacements, MPI_INT,
                 rec_buf, 79, MPI_INT, 0, MPI_COMM_WORLD);
}   

MPI_SCATTERV is a collective MPI operation. Collective operations must be executed by all processes in the specified communicator in order to complete successfully. You are executing it only in rank 0 and that's why only it gets the correct values.

Solution: remove the conditional if (...).

But there is another subtle mistake here. Since collective operations do not provide any status output, the MPI standard enforces strict matching of the number of elements sent to some rank and the number of elements the rank is willing to receive. In your case the receiver always specifies 79 elements which might not match the corresponding number in send_counts. You should instead use:

MPI_Scatterv(file_numbers, send_counts, displacements, MPI_INT,
             rec_buf, send_counts[id], MPI_INT,
             0, MPI_COMM_WORLD);

Also note the following discrepancy in your code that might as well be a typo while posting the question here:

MPI_Comm_size(MPI_COMM_WORLD, &procs);
                               ^^^^^
int send_counts[nodes] = {0};
                ^^^^^
int displacements[nodes] = {0};
                  ^^^^^

While you obtain the number of ranks in the procs variable, nodes is used in the rest of your code. I guess nodes should be replaced by procs.



来源:https://stackoverflow.com/questions/23165337/mpis-scatterv-operation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!