Ensure hybrid MPI / OpenMP runs each OpenMP thread on a different core

女生的网名这么多〃 提交于 2019-12-06 10:55:46

Actually, I'd expect your first example to work. Setting the OMP_PROC_BIND=true here is important, so that OpenMP stays within the CPU binding from the MPI process when pinning it's threads.

Depending on the batch system and MPI implementation, there might be very individual ways to set these things up.

Also Hyperthreading, or in general multiple hardware threads per core, that all show up as "cores" in your Linux, might be part of the problem as you'll never see 200% when two processes run on the two Hyperthreads of one cores.

Here is a generic solution, I use when figuring these things for some MPI and some OpenMP implementation on some system. There's documentation from Cray which contains a very helpful program to figure these things out quickly, it's called xthi.c, search for the filename from here, section 9.8 (not sure if it's legal to past it here...). Compile with:

mpicc xthi.c -fopenmp -o xthi

Now we can see what exactly is going on, for instance on a 2x 8 Core Xeon with Hyperthreading and Intel MPI (MPICH-based) we get:

$ OMP_PROC_BIND=true OMP_PLACES=cores OMP_NUM_THREADS=2 mpiexec -n 2 ./xthi

Hello from rank 0, thread 0, on localhost. (core affinity = 0,16)
Hello from rank 0, thread 1, on localhost. (core affinity = 1,17)
Hello from rank 1, thread 0, on localhost. (core affinity = 8,24)
Hello from rank 1, thread 1, on localhost. (core affinity = 9,25)

As you can see, core means, all the Hyperthreads of a core. Note how mpirun pins it different sockets, too by default. And With OMP_PLACES=threads you get one thread per core:

$ OMP_PROC_BIND=true OMP_PLACES=threads OMP_NUM_THREADS=2 mpiexec -n 2 ./xthi
Hello from rank 0, thread 0, on localhost. (core affinity = 0)
Hello from rank 0, thread 1, on localhost. (core affinity = 1)
Hello from rank 1, thread 0, on localhost. (core affinity = 8)
Hello from rank 1, thread 1, on localhost. (core affinity = 9)

With OMP_PROC_BIND=false (your second example), I get:

$ OMP_PROC_BIND=false OMP_PLACES=cores OMP_NUM_THREADS=2 mpiexec -n 2 ./xthi
Hello from rank 0, thread 0, on localhost. (core affinity = 0-7,16-23)
Hello from rank 0, thread 1, on localhost. (core affinity = 0-7,16-23)
Hello from rank 1, thread 0, on localhost. (core affinity = 8-15,24-31)
Hello from rank 1, thread 1, on localhost. (core affinity = 8-15,24-31)

Here, each OpenMP thread gets a full socket, so the MPI ranks still operate on distinct resources. However, the OpenMP threads, within one process could be scheduled wildly by the OS across all cores. It's the same as just setting OMP_NUM_THREADS=2 on my test system.

Again, this might depend on specific OpenMP and MPI implementations and versions, but I think you'll easily figure out what's going on with the description above.

Hope that helps.

You can try this

OMP_PROC_BIND=true OMP_PLACES=cores OMP_NUM_THREADS=2 mpiexec -bind-to core:2 -n 2 ./xthi

a MPI task is bound on two cores, and the OpenMP runtime will (hopefully) bind each threads to a single core that was assigned to the MPI task.

In order to check the MPI binding is working fine, you can simply

$ mpiexec -np 2 -bind-to core:2 grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list:  0-1
Cpus_allowed_list:  2-3
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!