Is there a simple process-based parallel map for python?

前端 未结 4 1246
小蘑菇
小蘑菇 2020-11-29 18:50

I\'m looking for a simple process-based parallel map for python, that is, a function

parmap(function,[data])

that would run function on eac

4条回答
  •  迷失自我
    2020-11-29 19:29

    This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.

    To parallelize your example, you'd need to define your map function with the @ray.remote decorator, and then invoke it with .remote. This will ensure that every instance of the remote function will executed in a different process.

    import time
    import ray
    
    ray.init()
    
    # Define the function you want to apply map on, as remote function. 
    @ray.remote
    def f(x):
        # Do some work...
        time.sleep(1)
        return x*x
    
    # Define a helper parmap(f, list) function.
    # This function executes a copy of f() on each element in "list".
    # Each copy of f() runs in a different process.
    # Note f.remote(x) returns a future of its result (i.e., 
    # an identifier of the result) rather than the result itself.  
    def parmap(f, list):
        return [f.remote(x) for x in list]
    
    # Call parmap() on a list consisting of first 5 integers.
    result_ids = parmap(f, range(1, 6))
    
    # Get the results
    results = ray.get(result_ids)
    print(results)
    

    This will print:

    [1, 4, 9, 16, 25]
    

    and it will finish in approximately len(list)/p (rounded up the nearest integer) where p is number of cores on your machine. Assuming a machine with 2 cores, our example will execute in 5/2 rounded up, i.e, in approximately 3 sec.

    There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.

提交回复
热议问题