Exchange Data Between MPI processes (halo)

一世执手 提交于 2019-11-29 12:40:30

For nearest neighbour style halo swaps, usually one of the most efficient implementations is to use a set of MPI_Sendrecv calls, usually two per each dimension:

Half-step one - Transfer of data in positive direction: each rank receives from the one on its left and into its left halo and sends data to the rank on its right

    +-+-+---------+-+-+     +-+-+---------+-+-+     +-+-+---------+-+-+
--> |R| | (i,j-1) |S| | --> |R| |  (i,j)  |S| | --> |R| | (i,j+1) |S| | -->
    +-+-+---------+-+-+     +-+-+---------+-+-+     +-+-+---------+-+-+

(S designates the part of the local data being communicated while R designates the halo into which data is being received, (i,j) are the coordinates of the rank in the process grid)

Half-step two - Transfer of data in negative direction: each rank receives from the one on its right and into its right halo and sends data to the rank on its left

    +-+-+---------+-+-+     +-+-+---------+-+-+     +-+-+---------+-+-+
<-- |X|S| (i,j-1) | |R| <-- |X|S|  (i,j)  | |R| <-- |X|S| (i,j+1) | |R| <--
    +-+-+---------+-+-+     +-+-+---------+-+-+     +-+-+---------+-+-+

(X is that part of the halo region that has already been populated in the previous half-step)

Most switched networks support multiple simultaneous bi-directional (full duplex) communications and the latency of the whole exchange is

Both of the above half-steps are repeated as many times as is the dimensionality of the domain decomposition.

The process is even more simplified in version 3.0 of the standard, which introduces the so-called neighbourhood collective communications. The whole multidimensional halo swap can be performed using a single call to MPI_Neighbor_alltoallw.

Your use of the word halo in your question suggests you might be setting up a computational domain which is split across processes. This is a very common approach in MPI programs in a wide range of applications. Typically each process computes over its local domain, then all processes swap halo elements with their neighbours, then repeat until satisfied.

While you could create dedicated buffers for exchanging the halo elements I think a more usual approach, and certainly a sensible first approach, is to think of the halo elements themselves as the buffers you are looking for. For example, if you have a 100x100 computational domain split across 100 processes each process gets a 12x12 local domain -- here I'm assuming a 1-cell overlap with each of the 4 orthogonal neighbours and take care at the edges of the global domain. The halo cells are those cells in the boundary of each local domain and there is no need to marshal the elements into another buffer prior to communication.

If I have correctly guessed at the type of computation you are trying to implement you should look at mpi_cart_create and its associated functions; these are designed to make it easy to set up and implement programs in which calculation steps are interleaved with steps for communication between neighbouring processes. The net is awash with examples of creating and using such cartesian topologies.

If this is the style of computation you are planning, then mpi_bcast is absolutely the wrong thing to be using. MPI broadcasts (and similar functions) are collective operations in which all processes (in a given communicator) engage. Broadcasts are useful for global communications but halo exchanges are local communications.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!