Amazon S3 concatenate small files

╄→尐↘猪︶ㄣ 提交于 2019-12-04 01:46:21

Amazon S3 does not provide a concatenate function. It is primarily an object storage service.

You will need some process that downloads the objects, combines them, then uploads them again. The most efficient way to do this would be to download the objects in parallel, to take full advantage of available bandwidth. However, that is more complex to code.

I would recommend doing the processing on "in the cloud" to avoid having to download the objects across the Internet. Doing it on Amazon EC2 or AWS Lambda would be more efficient and less costly.

Edit: Didn't see the 5MB requirement. This method will not work because of this requirement.

From https://ruby.awsblog.com/post/Tx2JE2CXGQGQ6A4/Efficient-Amazon-S3-Object-Concatenation-Using-the-AWS-SDK-for-Ruby:

While it is possible to download and re-upload the data to S3 through an EC2 instance, a more efficient approach would be to instruct S3 to make an internal copy using the new copy_part API operation that was introduced into the SDK for Ruby in version 1.10.0.

Code:

require 'rubygems'
require 'aws-sdk'

s3 = AWS::S3.new()
mybucket = s3.buckets['my-multipart']

# First, let's start the Multipart Upload
obj_aggregate = mybucket.objects['aggregate'].multipart_upload

# Then we will copy into the Multipart Upload all of the objects in a certain S3 directory.
mybucket.objects.with_prefix('parts/').each do |source_object|

  # Skip the directory object
  unless (source_object.key == 'parts/')
    # Note that this section is thread-safe and could greatly benefit from parallel execution.
    obj_aggregate.copy_part(source_object.bucket.name + '/' + source_object.key)
  end

end

obj_completed = obj_aggregate.complete()

# Generate a signed URL to enable a trusted browser to access the new object without authenticating.
puts obj_completed.url_for(:read)

Limitations (among others)

  • With the exception of the last part, there is a 5 MB minimum part size.
  • The completed Multipart Upload object is limited to a 5 TB maximum size.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!