Is there a way to concatenate small files which are less than 5MBs on Amazon S3. Multi-Part Upload is not ok because of small files.
It's not a efficient solution to pull down all these files and do the concatenation.
So, can anybody tell me some APIs to do these?
Amazon S3 does not provide a concatenate function. It is primarily an object storage service.
You will need some process that downloads the objects, combines them, then uploads them again. The most efficient way to do this would be to download the objects in parallel, to take full advantage of available bandwidth. However, that is more complex to code.
I would recommend doing the processing on "in the cloud" to avoid having to download the objects across the Internet. Doing it on Amazon EC2 or AWS Lambda would be more efficient and less costly.
Edit: Didn't see the 5MB requirement. This method will not work because of this requirement.
While it is possible to download and re-upload the data to S3 through an EC2 instance, a more efficient approach would be to instruct S3 to make an internal copy using the new copy_part API operation that was introduced into the SDK for Ruby in version 1.10.0.
Code:
require 'rubygems'
require 'aws-sdk'
s3 = AWS::S3.new()
mybucket = s3.buckets['my-multipart']
# First, let's start the Multipart Upload
obj_aggregate = mybucket.objects['aggregate'].multipart_upload
# Then we will copy into the Multipart Upload all of the objects in a certain S3 directory.
mybucket.objects.with_prefix('parts/').each do |source_object|
# Skip the directory object
unless (source_object.key == 'parts/')
# Note that this section is thread-safe and could greatly benefit from parallel execution.
obj_aggregate.copy_part(source_object.bucket.name + '/' + source_object.key)
end
end
obj_completed = obj_aggregate.complete()
# Generate a signed URL to enable a trusted browser to access the new object without authenticating.
puts obj_completed.url_for(:read)
Limitations (among others)
- With the exception of the last part, there is a 5 MB minimum part size.
- The completed Multipart Upload object is limited to a 5 TB maximum size.
来源:https://stackoverflow.com/questions/32448416/amazon-s3-concatenate-small-files