MPI_Gatherv memory problems (MPI+C)

感情迁移 提交于 2019-12-24 20:19:59

问题


As a continuation to my previous question, I have modified the code for variable number of kernels. However, the way Gatherv is implemented in my code seems to be unreliable. Once in 3-4 runs the end sequence in the collecting buffer ends up being corrupted, it seems like, due to the memory leakage. Sample code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>

int main (int argc, char *argv[]) {

MPI_Init(&argc, &argv);
int world_size,*sendarray;
int rank, *rbuf=NULL, count,total_counts=0;
int *displs=NULL,i,*rcounts=NULL;

MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

if(rank==0){
    displs = malloc((world_size+1)*sizeof(int));
    for(int i=1;i<=world_size; i++)displs[i]=0;
    rcounts=malloc(world_size*sizeof(int));

    sendarray=malloc(1*sizeof(int));
    for(int i=0;i<1;i++)sendarray[i]=1111;
    count=1;
}

if(rank!=0){
    int size=rank*2;
    sendarray=malloc(size*sizeof(int));
    for(int i=0;i<size;i++)sendarray[i]=rank;
    count=size;
}

MPI_Barrier(MPI_COMM_WORLD);

MPI_Gather(&count,1,MPI_INT,rcounts,1,MPI_INT,0,MPI_COMM_WORLD);

MPI_Barrier(MPI_COMM_WORLD);

if(rank==0){
    displs[0]=0;
    for(int i=1;i<=world_size; i++){
        for(int j=0; j<i; j++)displs[i]+=rcounts[j];
    }

    total_counts=0;
    for(int i=0;i<world_size;i++)total_counts+=rcounts[i];
    rbuf = malloc(10*sizeof(int));
}

MPI_Gatherv(sendarray, count, MPI_INT, rbuf, rcounts,
            displs, MPI_INT, 0, MPI_COMM_WORLD);

if(rank==0){
    int SIZE=total_counts;
    for(int i=0;i<SIZE;i++)printf("(%d) %d ",i, rbuf[i]);

    free(rbuf);
    free(displs);
    free(rcounts);
}

if(rank!=0)free(sendarray);
MPI_Finalize();

}

Why is this happening and is there a way to fix it?

This becomes much worse in my actual project. Each sending buffer contains 150 doubles. The receiving buffer gets very dirty and sometimes I get an error of bed termination with exit code 6 or 11.

Can anyone at least reproduce my errors?

My guess: I am allocating memory for sendarray on each thread separately. If my virtual machine was 1-to-1 to the hardware, then, probably, there would be no such problem. But I have only 2 cores and run a process for 4 or more. Could it be the reason?


回答1:


Change this line:

rbuf = malloc(10*sizeof(int));

to:

rbuf = malloc(total_counts*sizeof(int));

As a side note: each MPI process exists in its own process address space and they cannot stomp on eachothers data except through erroneous data explicitly passed through the MPI_XXX functions, which results in undefined behavior.



来源:https://stackoverflow.com/questions/50517687/mpi-gatherv-memory-problems-mpic

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!