Diagnosing Memory leak in boto3

倖福魔咒の 提交于 2019-12-10 15:07:20

问题


I have a celery worker running on Elastic Beanstalk that polls a SQS queue, gets messages (containing S3 file names), downloads those files from S3 and processes them. My worker is scheduled to run at every 15 seconds but due to some reason the memory usage keeps on increasing with time.

This is the code I'm using to access SQS

def get_messages_from_sqs(queue_url, queue_region="us-west-2", number_of_messages=1):
    client = boto3.client('sqs', region_name=queue_region)
    sqs_response = client.receive_message(QueueUrl=queue_url, MaxNumberOfMessages=number_of_messages)
    messages = sqs_response.get("Messages", [])
    cleaned_messages = []
    for message in messages:
        body = json.loads(message["Body"])
        data = body["Records"][0]
        data["receipt_handle"] = message["ReceiptHandle"]
        cleaned_messages.append(data)
    return cleaned_messages

def download_file_from_s3(bucket_name, filename):
    s3_client = boto3.client('s3')
    s3_client.download_file(bucket_name, filename, '/tmp/{}'.format(filename))

Do we need to close client connection in boto3 after we're done with the process ? If so, how can we do it ?


回答1:


I have run into similar issues using Celery in production, completely unrelated to Boto. Although I do not have an explanation for the memory leak (this would take some serious code spelunking and profiling), I can offer a potential workaround if your goal is just to not run out of memory.

Setting max tasks per child should allow you to constantly reclaim the memory as it is released by the killed process.



来源:https://stackoverflow.com/questions/50343382/diagnosing-memory-leak-in-boto3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!