How to use multiple nodes/cores on a cluster with parellelized Python code

允我心安 提交于 2019-12-06 05:32:53

问题


I have a piece of Python code where I use joblib and multiprocessing to make parts of the code run in parallel. I have no trouble running this on my desktop where I can use Task Manager to see that it uses all four cores and runs the code in parallel.

I recently learnt that I have access to a HPC cluster with 100+ 20 core nodes. The cluster uses SLURM as the workload manager.

The first question is: Is it possible to run parallelized Python code on a cluster?

If it is possible,

  1. Does the Python code I have need to be changed at all to run on the cluster, and

  2. What #SBATCH instructions need to be put in the job submission file to tell it that the parallelized parts of the code should run on four cores (or is it four nodes)?

The cluster I have access to has the following attributes:

PARTITION      CPUS(A/I/O/T)       NODES(A/I)  TIMELIMIT      MEMORY  CPUS  SOCKETS CORES 
standard       324/556/16/896      34/60       5-00:20:00     46000+  8+    2       4+

回答1:


Typically MPI is considered the de-facto standard for High-Performance Computing. There are a few MPI bindings for Python:

  • MPI for Python
  • pyMPI
  • Boost.MPI has Python bindings.

There are also a bunch of frameworks for that - list

Your code will require at least minimal changes, but they shouldn't be too much.

When you port to MPI you can run a single process per core and you will not need to use multiprocessing

So, for example, if you have 100 nodes with 24 cores each, you will run 2400 Python processes.



来源:https://stackoverflow.com/questions/28072164/how-to-use-multiple-nodes-cores-on-a-cluster-with-parellelized-python-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!