I am storing two million files in an amazon S3 bucket. There is a given root (l1) below, a list of directories under l1 and then each directory contains files. So my bucket
This thread is quite old but I did run into this issue recently and wanted to assert my 2cents...
It is a hassle and a half (it seems) to cleanly list out folders given a path in an S3 bucket. Most of the current gem wrappers around the S3 API (AWS-SDK official, S3) don't correctly parse the return object (specifically the CommonPrefixes) so it is difficult to get back a list of folders (delimiter nightmares).
Here is a quick fix for those using the S3 gem... Sorry it isn't one size fits all but it's the best I wanted to do.
https://github.com/qoobaa/s3/issues/61
Code snippet:
module S3
class Bucket
# this method recurses if the response coming back
# from S3 includes a truncation flag (IsTruncated == 'true')
# then parses the combined response(s) XML body
# for CommonPrefixes/Prefix AKA directories
def directory_list(options = {}, responses = [])
options = {:delimiter => "/"}.merge(options)
response = bucket_request(:get, :params => options)
if is_truncated?(response.body)
directory_list(options.merge({:marker => next_marker(response.body)}), responses << response.body)
else
parse_xml_array(responses + [response.body], options)
end
end
private
def parse_xml_array(xml_array, options = {}, clean_path = true)
names = []
xml_array.each do |xml|
rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") do |e|
if clean_path
names << e.text.gsub((options[:prefix] || ''), '').gsub((options[:delimiter] || ''), '')
else
names << e.text
end
end
end
names
end
def next_marker(xml)
marker = nil
rexml_document(xml).elements.each("ListBucketResult/NextMarker") {|e| marker ||= e.text }
if marker.nil?
raise StandardError
else
marker
end
end
def is_truncated?(xml)
is_truncated = nil
rexml_document(xml).elements.each("ListBucketResult/IsTruncated") {|e| is_truncated ||= e.text }
is_truncated == 'true'
end
end
end