Python compute cluster

与世无争的帅哥 提交于 2019-12-02 21:16:07

The Python wiki hosts a very comprehensive list of Python cluster computing libraries and tools. You might be especially interested in Parallel Python.

Edit: There is a new library that is IMHO especially good at clustering: execnet. It is small and simple. And it appears to have less bugs than, say, the standard multiprocessing module.

You can see most of the third-party packages available for Python 3 listed here; relevant to cluster computation is mpi4py -- most other distributed computing tools such as pyro are still Python-2 only, but MPI is a leading standard for cluster distributed computation and well looking into (I have no direct experience using mpi4py with Python 3, yet, but by hearsay I believe it's a good implementation).

The main alternative is Python's own built-in multiprocessing, which also scales up pretty well if you have no interest in interfacing existing nodes that respect the MPI standards but may not be coded in Python.

There is no real added value in rolling your own (as Atwood says, don't reinvent the wheel, unless your purpose is just to better understand wheels!-) -- use one of the solid, tested, widespread solutions, already tested, debugged and optimized on your behalf!-)

Look into these

http://www.parallelpython.com/

http://pyro.sourceforge.net/

I have used both and both are exellent for distributed computing
for more detailed list of options see http://wiki.python.org/moin/ParallelProcessing

and if you want to auto execute something on remote machine , better alternative to telnet is ssh as in http://pydsh.sourceforge.net/

What kind of stuff do you want to do? You might want to check out hadoop. The backend, heavy lifting is done in java, but has a python interface, so you can write python scripts create and send the input, as well as process the results.

If you need to write administrative scripts, take a look at the ClusterShell Python library too, or/and its parallel shell clush. It's useful when dealing with node sets also (man nodeset).

I think IPython.parallel is the way to go. I've been using it extensively for the last year and a half. It allows you to work interactively with as many worker nodes as you want. If you are on AWS, StarCluster is a great way to get IPython.parallel up and running quickly and easily with as many EC2 nodes as you can afford. (It can also automatically install Hadoop, and a variety of other useful tools, if needed.) There are some tricks to using it. (For example, you don't want to send large amounts of data through the IPython.parallel interface itself. Better to distribute a script that will pull down chunks of data on each engine individually.) But overall, I've found it to be a remarkably easy way to do distributed processing (WAY better than Hadoop!)

"Would it be possible to make a python cluster"

Yes.

I love yes/no questions. Anything else you want to know?

(Note that Python 3 has few third-party libraries yet, so you may wanna stay with Python 2 at the moment.)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!