_multiprocessing.SemLock is not implemented when running on AWS Lambda

前端未结

关注

 5  1746

Happy的楠姐

I have a short code that uses the multiprocessing package and works fine on my local machine.

When I uploaded to AWS Lambda and run there,

相关标签:

5条回答

暖寄归人

2020-12-13 20:25

You can upgrade and use AWS Lambda runtimes Python 3.7. Works for me!

AWS Lambda supports multiple languages through the use of runtimes. You choose a runtime when you create a function, and you can change runtimes by updating your function's configuration. The underlying execution environment provides additional libraries and environment variables that you can access from your function code.

0 讨论(0)
发布评论:

提交评论
- 加载中...

一整个雨季

2020-12-13 20:41

I have bumped into the same issue. This is the code I had previously that worked just fine in my local machine:

import concurrent.futures


class Concurrent:

    @staticmethod
    def execute_concurrently(function, kwargs_list):
        results = []
        with concurrent.futures.ProcessPoolExecutor() as executor:
            for _, result in zip(kwargs_list, executor.map(function, kwargs_list)):
                results.append(result)
        return results

And I replaced it with this one:

import concurrent.futures


class Concurrent:

    @staticmethod
    def execute_concurrently(function, kwargs_list):
        results = []
        with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
            futures = [executor.submit(function, kwargs) for kwargs in kwargs_list]
        for future in concurrent.futures.as_completed(futures):
            results.append(future.result())
        return results

Works like a charm.

Taken from this pull request

0 讨论(0)

夕颜

2020-12-13 20:44

You can run routines in parallel on AWS Lambda using Python's multiprocessing module but you can't use Pools or Queues as noted in other answers. A workable solution is to use Process and Pipe as outlined in this article https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/

While the article definitely helped me get to a solution (shared below) there are a few things to be aware of. First, the Process and Pipe based solution is not as fast as the built in map function in Pool, though I did see nearly linear speed-ups as I increased the available memory/CPU resources in my Lambda function. Second there is a fair bit of management that has to be undertaken when developing multiprocessing functions in this way. I suspect this is at least partly why my solution is slower than the built in methods. If anyone has suggestions to speed it up I'd love to hear them! Finally, while the article notes that multiprocessing is useful for offloading asynchronous processes, there are other reasons to use multiprocessing such as lots of intensive math ops which is what I was trying to do. In the end I was happy enough with the performance improvement as it was much better than sequential execution!

The code:

# Python 3.6
from multiprocessing import Pipe, Process

def myWorkFunc(data, connection):
    result = None

    # Do some work and store it in result

    if result:
        connection.send([result])
    else:
        connection.send([None])


def myPipedMultiProcessFunc():

    # Get number of available logical cores
    plimit = multiprocessing.cpu_count()

    # Setup management variables
    results = []
    parent_conns = []
    processes = []
    pcount = 0
    pactive = []
    i = 0

    for data in iterable:
        # Create the pipe for parent-child process communication
        parent_conn, child_conn = Pipe()
        # create the process, pass data to be operated on and connection
        process = Process(target=myWorkFunc, args=(data, child_conn,))
        parent_conns.append(parent_conn)
        process.start()
        pcount += 1

        if pcount == plimit: # There is not currently room for another process
            # Wait until there are results in the Pipes
            finishedConns = multiprocessing.connection.wait(parent_conns)
            # Collect the results and remove the connection as processing
            # the connection again will lead to errors
            for conn in finishedConns:
                results.append(conn.recv()[0])
                parent_conns.remove(conn)
                # Decrement pcount so we can add a new process
                pcount -= 1

    # Ensure all remaining active processes have their results collected
    for conn in parent_conns:
        results.append(conn.recv()[0])
        conn.close()

    # Process results as needed

0 讨论(0)

别那么骄傲

2020-12-13 20:45

multiprocessing.Pool does not seem to be supported natively (because of the issue with SemLock), but multiprocessing.Process, multiprocessing.Queue, multiprocessing.Pipe etc work properly in AWSLambda.

That should allow you to build a workaround solution by manually creating/forking processes and using a multiprocessing.Pipe for communication between the parent and child processes. Hope that helps

0 讨论(0)
发布评论:

提交评论
- 加载中...
無奈伤痛

2020-12-13 20:46

As far as I can tell, multiprocessing won't work on AWS Lambda because the execution environment/container is missing /dev/shm - see https://forums.aws.amazon.com/thread.jspa?threadID=219962 (login may be required).

No word (that I can find) on if/when Amazon will change this. I also looked at other libraries e.g. https://pythonhosted.org/joblib/parallel.html will fallback to /tmp (which we know DOES exist) if it can't find /dev/shm, but that doesn't actually solve the problem.

0 讨论(0)
发布评论:

提交评论
- 加载中...