how can Python see 12 cpus on a cluster where I got allocated 4 cores by LSF?

你。 提交于 2019-12-08 07:53:20

问题


I access a Linux cluster where resources are allocated using LSF, which I think is a common tool and comes from Scali (http://www.scali.com/workload-management/high-performance-computing). In an interactive queue, I asked for and got the maximum number of cores: 4. But if I check how many cpus does Python's multiprocessing module see, the number is 12, the number of physical cores the node I was allocated to has. It looks like the multiprocessing module has problems respecting the bounds that LSF should/would impose. Is this a problem in LSF or Python?

[lsandor@iliadaccess03 peers_prisons]$ bsub -Is -n 4 -q interact sh
Job <7408231> is submitted to queue <interact>.
<<Waiting for dispatch ...>>
<<Starting on heroint5>>
sh-3.2$ python3
Python 3.2 (r32:88445, Jun 13 2011, 09:20:03) 
[GCC 4.3.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> 
>>> multiprocessing.cpu_count()
12

回答1:


Not a problem, although your program should respect the amount of resources allocated to it by the queuing system, which may be considerably less than 100% as you have realized. I don't believe LSF has OS-level hooks to enforce compliance, nor probably should it.

In the past I've seen this handled with a wrapper script. One that that sets up a program and job simultaneously with the appropriate settings, then launches it.




回答2:


A bit late to the party, but expanding on the answer of @Paddy3118, the span specification is not needed. Instead, the environment variable LSB_DJOB_NUMPROC holds the number of allocated cores. At least it does with the LSF version available to me (9.1.2).




回答3:


If you submit to lsf using the -n option to state how many processors you want and then use request that the four processors are made available on the same host by using span like in the command below:

bsub -n 4 -R "span[hosts=1]" my_job

Then my_job is started with the following environment variables set which can be interrogated by your python script to set the number of sub-processes to start equal to the number assigned by LSF:

LSB_HOSTS= "hostA hostA hostA hostA"
LSB_MCPU_HOSTS="hostA 4" 

(Or should the number of sub-processes be the number of processes allocated by LSF - 1 to account for the python script launching the sub-processes :-)



来源:https://stackoverflow.com/questions/7449893/how-can-python-see-12-cpus-on-a-cluster-where-i-got-allocated-4-cores-by-lsf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!