aws s3 > is “aws s3 cp” command implemented with multithreads?

∥☆過路亽.° 提交于 2019-12-24 05:46:56

问题


I am newbie in using aws s3 client. I tried to use "aws s3 cp" command to download batch of files from s3 to local file system, it is pretty fast. But I then tried to only read all the contents of the batch of files in a single thread loop by using the amazon java sdk API, it is suprisingly several times slower then the given "aws s3 cp" command :<

Anyone know what is the reason? I doubted that "aws s3 cp" is multi-threaded


回答1:


If you looked at the source of transferconfig.py, it indicates that the defaults are:

DEFAULTS = {
    'multipart_threshold': 8 * (1024 ** 2),
    'multipart_chunksize': 8 * (1024 ** 2),
    'max_concurrent_requests': 10,
    'max_queue_size': 1000,
}

which means that it can be doing 10 requests at the same time, and that it also chunks the transfers into 8MB pieces when the file is larger than 8MB

This is also documented on the s3 cli config documentation.

These are the configuration values you can set for S3:
max_concurrent_requests - The maximum number of concurrent requests.
max_queue_size - The maximum number of tasks in the task queue. multipart_threshold - The size threshold the CLI uses for multipart transfers of individual files.
multipart_chunksize - When using multipart transfers, this is the chunk size that the CLI uses for multipart transfers of individual files.

You could tune it down, to see if it compares with your simple method:

aws configure set default.s3.max_concurrent_requests 1

Don't forget to tune it back up afterwards, or else your AWS performance will be miserable.



来源:https://stackoverflow.com/questions/36647186/aws-s3-is-aws-s3-cp-command-implemented-with-multithreads

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!