Read a file line by line from S3 using boto?

前端 未结 10 1027
刺人心
刺人心 2020-11-29 07:32

I have a csv file in S3 and I\'m trying to read the header line to get the size (these files are created by our users so they could be almost any size). Is there a way to do

10条回答
  •  自闭症患者
    2020-11-29 07:59

    Expanding on kooshywoosh's answer: using TextIOWrapper (which is very useful) on a StreamingBody from a plain binary file directly isn't possible, as you'll get the following error:

    "builtins.AttributeError: 'StreamingBody' object has no attribute 'readable'"
    

    However, you can use the following hack mentioned in this long standing issue on botocore's github page, and define a very simple wrapper class around StreamingBody:

    from io import RawIOBase
    ...
    
    class StreamingBodyIO(RawIOBase):
    """Wrap a boto StreamingBody in the IOBase API."""
    def __init__(self, body):
        self.body = body
    
    def readable(self):
        return True
    
    def read(self, n=-1):
        n = None if n < 0 else n
        return self.body.read(n)
    

    Then, you can simply use the following code:

    from io import TextIOWrapper
    ...
    
    # get StreamingBody from botocore.response
    response = s3.get_object(Bucket=bucket, Key=key)
    data = TextIOWrapper(StreamingBodyIO(response))
    for line in data:
        # process line
    

提交回复
热议问题