I am looking for a python package that can do multiprocessing not just across different cores within a single computer, but also with a cluster distributed across multiple m
I'd suggest taking a look at Ray, which aims to do exactly that.
Ray uses the same syntax to parallelize code in the single machine multicore setting as it does in the distributed setting. If you're willing to use a for loop instead of a map call, then your example would look like the following.
import ray
import time
ray.init()
@ray.remote
def function(x):
time.sleep(0.1)
return x
arglist = [1, 2, 3, 4]
result_ids = [function.remote(x) for x in arglist]
resultlist = ray.get(result_ids)
That will run four tasks in parallel using however many cores you have locally. To run the same example on a cluster, the only line that would change would be the call to ray.init()
. The relevant documentation can be found here.
Note that I'm helping to develop Ray.