问题
I am newbie in using aws s3 client. I tried to use "aws s3 cp" command to download batch of files from s3 to local file system, it is pretty fast. But I then tried to only read all the contents of the batch of files in a single thread loop by using the amazon java sdk API, it is suprisingly several times slower then the given "aws s3 cp" command :<
Anyone know what is the reason? I doubted that "aws s3 cp" is multi-threaded
回答1:
If you looked at the source of transferconfig.py
, it indicates that the defaults are:
DEFAULTS = {
'multipart_threshold': 8 * (1024 ** 2),
'multipart_chunksize': 8 * (1024 ** 2),
'max_concurrent_requests': 10,
'max_queue_size': 1000,
}
which means that it can be doing 10 requests at the same time, and that it also chunks the transfers into 8MB pieces when the file is larger than 8MB
This is also documented on the s3 cli config documentation.
These are the configuration values you can set for S3:
max_concurrent_requests - The maximum number of concurrent requests.
max_queue_size - The maximum number of tasks in the task queue. multipart_threshold - The size threshold the CLI uses for multipart transfers of individual files.
multipart_chunksize - When using multipart transfers, this is the chunk size that the CLI uses for multipart transfers of individual files.
You could tune it down, to see if it compares with your simple method:
aws configure set default.s3.max_concurrent_requests 1
Don't forget to tune it back up afterwards, or else your AWS performance will be miserable.
来源:https://stackoverflow.com/questions/36647186/aws-s3-is-aws-s3-cp-command-implemented-with-multithreads