Parallelism in Python

后端 未结 5 1017
星月不相逢
星月不相逢 2021-01-07 17:19

What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise th

5条回答
  •  一个人的身影
    2021-01-07 18:14

    Ray is an elegant (and fast) library for doing this.

    The most basic strategy for parallelizing Python functions is to declare a function with the @ray.remote decorator. Then it can be invoked asynchronously.

    import ray
    import time
    
    # Start the Ray processes (e.g., a scheduler and shared-memory object store).
    ray.init(num_cpus=8)
    
    @ray.remote
    def f():
        time.sleep(1)
    
    # This should take one second assuming you have at least 4 cores.
    ray.get([f.remote() for _ in range(4)])
    

    You can also parallelize stateful computation using actors, again by using the @ray.remote decorator.

    # This assumes you already ran 'import ray' and 'ray.init()'.
    
    import time
    
    @ray.remote
    class Counter(object):
        def __init__(self):
            self.x = 0
    
        def inc(self):
            self.x += 1
    
        def get_counter(self):
            return self.x
    
    # Create two actors which will operate in parallel.
    counter1 = Counter.remote()
    counter2 = Counter.remote()
    
    @ray.remote
    def update_counters(counter1, counter2):
        for _ in range(1000):
            time.sleep(0.25)
            counter1.inc.remote()
            counter2.inc.remote()
    
    # Start three tasks that update the counters in the background also in parallel.
    update_counters.remote(counter1, counter2)
    update_counters.remote(counter1, counter2)
    update_counters.remote(counter1, counter2)
    
    # Check the counter values.
    for _ in range(5):
        counter1_val = ray.get(counter1.get_counter.remote())
        counter2_val = ray.get(counter2.get_counter.remote())
        print("Counter1: {}, Counter2: {}".format(counter1_val, counter2_val))
        time.sleep(1)
    

    It has a number of advantages over the multiprocessing module:

    • The same code runs on a single multi-core machine as well as a large cluster.
    • Data is shared efficiently between processes on the same machine using shared memory and efficient serialization.
    • You can parallelize Python functions (using tasks) and Python classes (using actors).
    • Error messages are propagated nicely.

    Ray is a framework I've been helping develop.

提交回复
热议问题