I\'m interested in running a Python program using a computer cluster. I have in the past been using Python MPI interfaces, but due to difficulties in compiling/installing th
If you are willing to pip install an open source package, you should consider Ray, which out of the Python cluster frameworks is probably the option that comes closest to the single threaded Python experience. It allows you to parallelize both functions (as tasks) and also stateful classes (as actors) and does all of the data shipping and serialization as well as exception message propagation automatically. It also allows similar flexibility to normal Python (actors can be passed around, tasks can call other tasks, there can be arbitrary data dependencies, etc.). More about that in the documentation.
As an example, this is how you would do your multiprocessing map example in Ray:
import ray
ray.init()
@ray.remote
def mapping_function(input):
return input + 1
results = ray.get([mapping_function.remote(i) for i in range(100)])
The API is a little bit different than Python's multiprocessing API, but should be easier to use. There is a walk-through tutorial that describes how to handle data-dependencies and actors, etc.
You can install Ray with "pip install ray" and then execute the above code on a single node, or it's also easy to set up a cluster, see Cloud support and Cluster support
Disclaimer: I'm one of the Ray developers.