S3: How to do a partial read / seek without downloading the complete file?

前端 未结 3 1833
不思量自难忘°
不思量自难忘° 2020-12-02 17:12

Although they resemble files, objects in Amazon S3 aren\'t really \"files\", just like S3 buckets aren\'t really directories. On a Unix system I can use head to

3条回答
  •  既然无缘
    2020-12-02 17:58

    Using Python you can preview first records of compressed file.

    Connect using boto.

    #Connect:
    s3 = boto.connect_s3()
    bname='my_bucket'
    self.bucket = s3.get_bucket(bname, validate=False)
    

    Read first 20 lines from gzip compressed file

    #Read first 20 records
    limit=20
    k = Key(self.bucket)
    k.key = 'my_file.gz'
    k.open()
    gzipped = GzipFile(None, 'rb', fileobj=k)
    reader = csv.reader(io.TextIOWrapper(gzipped, newline="", encoding="utf-8"), delimiter='^')
    for id,line in enumerate(reader):
        if id>=int(limit): break
        print(id, line)
    

    So it's an equivalent of a following Unix command:

    zcat my_file.gz|head -20
    

提交回复
热议问题