Python multiprocessing BETWEEN Amazon cloud instances

孤街浪徒 提交于 2019-12-01 06:44:15

the docs give you a good setup for running multiprocessing on multiple machines. Using s3 is a good way to share files across ec2 instances, but with multiprocessing you can share queues and pass data.

if you can use hadoop for parallel tasks, it is a very good way to extract parallelism across machines, but if you need a lot of IPC then building your own solution with multiprocessing isn't that bad.

just make sure you put your machines in the same security groups :-)

I would use dumbo. It is a python wrapper for Hadoop that is compatible with Amazon Elastic MapReduce. Write a little wrapper around your code to integrate with dumbo. Note that you probably need a map-only job with no reduce step.

I've been digging into IPython recently, and it looks like it supports parallel processing accross multiple hosts right out of the box:

http://ipython.org/ipython-doc/stable/html/parallel/index.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!