问题
Does amazon s3 support batch uploads? I have a job that needs to upload each night ~100K of files that can be up to 1G but is strongly skewed towards small files (90% are less than 100 bytes and 99% are less than 1000 bytes long).
Does the s3 API support uploading multiple objects in a single HTTP call?
All the objects must be available in S3 as individual objects. I cannot host them anywhere else (FTP, etc) or in another format (Database, EC2 local drive, etc). That is an external requirement that I cannot change.
回答1:
Does the s3 API support uploading multiple objects in a single HTTP call?
No, the S3 PUT operation only supports uploading one object per HTTP request.
You could install S3 Tools on your machine that you want to synchronize with the remote bucket, and run the following command:
s3cmd sync localdirectory s3://bucket/
Then you could place this command in a script and create a scheduled job to run this command each night.
This should do what you want.
The tool performs the file synchronization based on MD5 hashes and filesize, so collision should be rare (if you really want you could just use the "s3cmd put" command to force blind overwriting of objects in your target bucket).
EDIT: Also make sure that you read the documentation on the site I linked for S3 Tools - there are different flags needed for whether you want files deleted locally to be deleted from the bucket or ignored etc.
回答2:
Alternatively, you can upload S3 via AWS CLI tool using the sync command.
aws s3 sync local_folder s3://bucket-name
You can use this method to batch upload files to S3 very fast.
回答3:
To add on to what everyone is saying, if you want your java code (instead of the CLI) to do this without having to put all of the files in a single directory, you can create a list of files to upload and then supply that list to the AWS TransferManager's uploadFileList method.
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html#uploadFileList-java.lang.String-java.lang.String-java.io.File-java.util.List-
回答4:
One file (or part of a file) = one HTTP request, but the Java API now supports efficient multiple file upload without having to write the multithreading on your own, by using TransferManager
回答5:
If you want to use Java program to do it you can do:
public void uploadFolder(String bucket, String path, boolean includeSubDirectories) {
File dir = new File(path);
MultipleFileUpload upload = transferManager.uploadDirectory(bucket, "", dir, includeSubDirectories);
try {
upload.waitForCompletion();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Creation of s3client and transfer manager to connect to local S3 if you wish to test is as below:
AWSCredentials credentials = new BasicAWSCredentials(accessKey, token);
s3Client = new AmazonS3Client(credentials); // This is deprecated but you can create using standard beans provided by spring/aws
s3Client.setEndpoint("http://127.0.0.1:9000");//If you wish to connect to local S3 using minio etc...
TransferManager transferManager = TransferManagerBuilder.standard().withS3Client(s3Client).build();
来源:https://stackoverflow.com/questions/15050146/is-it-possible-to-perform-a-batch-upload-to-amazon-s3