Is there a simple process-based parallel map for python?

前端未结

关注

 4  1246

小蘑菇 2020-11-29 18:50

I\'m looking for a simple process-based parallel map for python, that is, a function

parmap(function,[data])

that would run function on eac

4条回答

迷失自我 (楼主)

2020-11-29 19:29
This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.

To parallelize your example, you'd need to define your map function with the @ray.remote decorator, and then invoke it with .remote. This will ensure that every instance of the remote function will executed in a different process.
```
import time
import ray

ray.init()

# Define the function you want to apply map on, as remote function. 
@ray.remote
def f(x):
    # Do some work...
    time.sleep(1)
    return x*x

# Define a helper parmap(f, list) function.
# This function executes a copy of f() on each element in "list".
# Each copy of f() runs in a different process.
# Note f.remote(x) returns a future of its result (i.e., 
# an identifier of the result) rather than the result itself.  
def parmap(f, list):
    return [f.remote(x) for x in list]

# Call parmap() on a list consisting of first 5 integers.
result_ids = parmap(f, range(1, 6))

# Get the results
results = ray.get(result_ids)
print(results)
```
This will print:
```
[1, 4, 9, 16, 25]
```
and it will finish in approximately len(list)/p (rounded up the nearest integer) where p is number of cores on your machine. Assuming a machine with 2 cores, our example will execute in 5/2 rounded up, i.e, in approximately 3 sec.

There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...