S3: How to do a partial read / seek without downloading the complete file?

前端未结

关注

 3  1833

不思量自难忘° 2020-12-02 17:12

Although they resemble files, objects in Amazon S3 aren\'t really \"files\", just like S3 buckets aren\'t really directories. On a Unix system I can use head to

3条回答

既然无缘 (楼主)

2020-12-02 17:58

Using Python you can preview first records of compressed file.

Connect using boto.

#Connect:
s3 = boto.connect_s3()
bname='my_bucket'
self.bucket = s3.get_bucket(bname, validate=False)

Read first 20 lines from gzip compressed file

#Read first 20 records
limit=20
k = Key(self.bucket)
k.key = 'my_file.gz'
k.open()
gzipped = GzipFile(None, 'rb', fileobj=k)
reader = csv.reader(io.TextIOWrapper(gzipped, newline="", encoding="utf-8"), delimiter='^')
for id,line in enumerate(reader):
    if id>=int(limit): break
    print(id, line)

So it's an equivalent of a following Unix command:

zcat my_file.gz|head -20

0 讨论(0)

查看其它3个回答