I\'m looking for a simple process-based parallel map for python, that is, a function
parmap(function,[data])
that would run function on eac
This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.
To parallelize your example, you'd need to define your map function with the @ray.remote
decorator, and then invoke it with .remote
. This will ensure that every instance of the remote function will executed in a different process.
import time
import ray
ray.init()
# Define the function you want to apply map on, as remote function.
@ray.remote
def f(x):
# Do some work...
time.sleep(1)
return x*x
# Define a helper parmap(f, list) function.
# This function executes a copy of f() on each element in "list".
# Each copy of f() runs in a different process.
# Note f.remote(x) returns a future of its result (i.e.,
# an identifier of the result) rather than the result itself.
def parmap(f, list):
return [f.remote(x) for x in list]
# Call parmap() on a list consisting of first 5 integers.
result_ids = parmap(f, range(1, 6))
# Get the results
results = ray.get(result_ids)
print(results)
This will print:
[1, 4, 9, 16, 25]
and it will finish in approximately len(list)/p
(rounded up the nearest integer) where p
is number of cores on your machine. Assuming a machine with 2 cores, our example will execute in 5/2
rounded up, i.e, in approximately 3
sec.
There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.