Listing directories at a given level in Amazon S3

前端 未结 2 1568
Happy的楠姐
Happy的楠姐 2021-01-03 05:25

I am storing two million files in an amazon S3 bucket. There is a given root (l1) below, a list of directories under l1 and then each directory contains files. So my bucket

相关标签:
2条回答
  • 2021-01-03 06:03

    right_aws allows to do this as part of their underlying S3Interface class, but you can create your own method for an easier (and nicer) use. Put this at the top of your code:

    module RightAws
      class S3
        class Bucket
          def common_prefixes(prefix, delimiter = '/')
            common_prefixes = []
            @s3.interface.incrementally_list_bucket(@name, { 'prefix' => prefix, 'delimiter' => delimiter }) do |thislist|          
              common_prefixes += thislist[:common_prefixes]
            end
            common_prefixes
          end
        end
      end
    end
    

    This adds the common_prefixes method to the RightAws::S3::Bucket class. Now, instead of calling mybucket.keys to fetch the list of keys in your bucket, you can use mybucket.common_prefixes to get an array of common prefixes. In your case:

    mybucket.common_prefixes("l1/")
    # => ["l1/a1", "l1/a2", ... "l1/a5000"]
    

    I must say I tested it only with a small number of common prefixes; you should check that this works with more than 1000 common prefixes.

    0 讨论(0)
  • 2021-01-03 06:08

    This thread is quite old but I did run into this issue recently and wanted to assert my 2cents...

    It is a hassle and a half (it seems) to cleanly list out folders given a path in an S3 bucket. Most of the current gem wrappers around the S3 API (AWS-SDK official, S3) don't correctly parse the return object (specifically the CommonPrefixes) so it is difficult to get back a list of folders (delimiter nightmares).

    Here is a quick fix for those using the S3 gem... Sorry it isn't one size fits all but it's the best I wanted to do.

    https://github.com/qoobaa/s3/issues/61

    Code snippet:

    module S3
      class Bucket
        # this method recurses if the response coming back
        # from S3 includes a truncation flag (IsTruncated == 'true')
        # then parses the combined response(s) XML body
        # for CommonPrefixes/Prefix AKA directories
        def directory_list(options = {}, responses = [])
          options = {:delimiter => "/"}.merge(options)
          response = bucket_request(:get, :params => options)
    
          if is_truncated?(response.body)
            directory_list(options.merge({:marker => next_marker(response.body)}), responses << response.body)
          else
            parse_xml_array(responses + [response.body], options)
          end
        end
    
        private
    
        def parse_xml_array(xml_array, options = {}, clean_path = true)
          names = []
          xml_array.each do |xml|
            rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") do |e|
              if clean_path
                names << e.text.gsub((options[:prefix] || ''), '').gsub((options[:delimiter] || ''), '')
              else
                names << e.text
              end
            end
          end
          names
        end
    
        def next_marker(xml)
          marker = nil
          rexml_document(xml).elements.each("ListBucketResult/NextMarker") {|e| marker ||= e.text }
          if marker.nil?
            raise StandardError
          else
            marker
          end
        end
    
        def is_truncated?(xml)
          is_truncated = nil
          rexml_document(xml).elements.each("ListBucketResult/IsTruncated") {|e| is_truncated ||= e.text }
          is_truncated == 'true'
        end
      end
    end
    
    0 讨论(0)
提交回复
热议问题