_multiprocessing.SemLock is not implemented when running on AWS Lambda

前端 未结 5 1740
Happy的楠姐
Happy的楠姐 2020-12-13 19:59

I have a short code that uses the multiprocessing package and works fine on my local machine.

When I uploaded to AWS Lambda and run there,

相关标签:
5条回答
  • 2020-12-13 20:25

    You can upgrade and use AWS Lambda runtimes Python 3.7. Works for me!

    AWS Lambda supports multiple languages through the use of runtimes. You choose a runtime when you create a function, and you can change runtimes by updating your function's configuration. The underlying execution environment provides additional libraries and environment variables that you can access from your function code.

    0 讨论(0)
  • 2020-12-13 20:41

    I have bumped into the same issue. This is the code I had previously that worked just fine in my local machine:

    import concurrent.futures
    
    
    class Concurrent:
    
        @staticmethod
        def execute_concurrently(function, kwargs_list):
            results = []
            with concurrent.futures.ProcessPoolExecutor() as executor:
                for _, result in zip(kwargs_list, executor.map(function, kwargs_list)):
                    results.append(result)
            return results
    

    And I replaced it with this one:

    import concurrent.futures
    
    
    class Concurrent:
    
        @staticmethod
        def execute_concurrently(function, kwargs_list):
            results = []
            with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
                futures = [executor.submit(function, kwargs) for kwargs in kwargs_list]
            for future in concurrent.futures.as_completed(futures):
                results.append(future.result())
            return results
    

    Works like a charm.

    Taken from this pull request

    0 讨论(0)
  • 2020-12-13 20:44

    You can run routines in parallel on AWS Lambda using Python's multiprocessing module but you can't use Pools or Queues as noted in other answers. A workable solution is to use Process and Pipe as outlined in this article https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/

    While the article definitely helped me get to a solution (shared below) there are a few things to be aware of. First, the Process and Pipe based solution is not as fast as the built in map function in Pool, though I did see nearly linear speed-ups as I increased the available memory/CPU resources in my Lambda function. Second there is a fair bit of management that has to be undertaken when developing multiprocessing functions in this way. I suspect this is at least partly why my solution is slower than the built in methods. If anyone has suggestions to speed it up I'd love to hear them! Finally, while the article notes that multiprocessing is useful for offloading asynchronous processes, there are other reasons to use multiprocessing such as lots of intensive math ops which is what I was trying to do. In the end I was happy enough with the performance improvement as it was much better than sequential execution!

    The code:

    # Python 3.6
    from multiprocessing import Pipe, Process
    
    def myWorkFunc(data, connection):
        result = None
    
        # Do some work and store it in result
    
        if result:
            connection.send([result])
        else:
            connection.send([None])
    
    
    def myPipedMultiProcessFunc():
    
        # Get number of available logical cores
        plimit = multiprocessing.cpu_count()
    
        # Setup management variables
        results = []
        parent_conns = []
        processes = []
        pcount = 0
        pactive = []
        i = 0
    
        for data in iterable:
            # Create the pipe for parent-child process communication
            parent_conn, child_conn = Pipe()
            # create the process, pass data to be operated on and connection
            process = Process(target=myWorkFunc, args=(data, child_conn,))
            parent_conns.append(parent_conn)
            process.start()
            pcount += 1
    
            if pcount == plimit: # There is not currently room for another process
                # Wait until there are results in the Pipes
                finishedConns = multiprocessing.connection.wait(parent_conns)
                # Collect the results and remove the connection as processing
                # the connection again will lead to errors
                for conn in finishedConns:
                    results.append(conn.recv()[0])
                    parent_conns.remove(conn)
                    # Decrement pcount so we can add a new process
                    pcount -= 1
    
        # Ensure all remaining active processes have their results collected
        for conn in parent_conns:
            results.append(conn.recv()[0])
            conn.close()
    
        # Process results as needed
    
    0 讨论(0)
  • 2020-12-13 20:45

    multiprocessing.Pool does not seem to be supported natively (because of the issue with SemLock), but multiprocessing.Process, multiprocessing.Queue, multiprocessing.Pipe etc work properly in AWSLambda.

    That should allow you to build a workaround solution by manually creating/forking processes and using a multiprocessing.Pipe for communication between the parent and child processes. Hope that helps

    0 讨论(0)
  • 2020-12-13 20:46

    As far as I can tell, multiprocessing won't work on AWS Lambda because the execution environment/container is missing /dev/shm - see https://forums.aws.amazon.com/thread.jspa?threadID=219962 (login may be required).

    No word (that I can find) on if/when Amazon will change this. I also looked at other libraries e.g. https://pythonhosted.org/joblib/parallel.html will fallback to /tmp (which we know DOES exist) if it can't find /dev/shm, but that doesn't actually solve the problem.

    0 讨论(0)
提交回复
热议问题