How can I use boto to stream a file out of Amazon S3 to Rackspace Cloudfiles?

后端 未结 5 1006
清歌不尽
清歌不尽 2020-11-30 00:14

I\'m copying a file from S3 to Cloudfiles, and I would like to avoid writing the file to disk. The Python-Cloudfiles library has an object.stream() call that looks to be wh

5条回答
  •  孤城傲影
    2020-11-30 00:26

    I figure at least some of the people seeing this question will be like me, and will want a way to stream a file from boto line by line (or comma by comma, or any other delimiter). Here's a simple way to do that:

    def getS3ResultsAsIterator(self, aws_access_info, key, prefix):        
        s3_conn = S3Connection(**aws_access)
        bucket_obj = s3_conn.get_bucket(key)
        # go through the list of files in the key
        for f in bucket_obj.list(prefix=prefix):
            unfinished_line = ''
            for byte in f:
                byte = unfinished_line + byte
                #split on whatever, or use a regex with re.split()
                lines = byte.split('\n')
                unfinished_line = lines.pop()
                for line in lines:
                    yield line
    

    @garnaat's answer above is still great and 100% true. Hopefully mine still helps someone out.

提交回复
热议问题