Write pandas dataframe as compressed CSV directly to Amazon s3 bucket?

后端 未结 3 444
醉酒成梦
醉酒成梦 2021-01-02 23:09

I currently have a script that reads the existing version of a csv saved to s3, combines that with the new rows in the pandas dataframe, and then writes that directly back t

3条回答
  •  孤独总比滥情好
    2021-01-02 23:56

    If you want streaming writes (to not hold (de)compressed CSV in memory), you can do this:

    import s3fs
    import io
    import gzip
    
        def write_df_to_s3(df, filename, path):
            s3 = s3fs.S3FileSystem(anon=False)
            with s3.open(path, 'wb') as f:
                gz = gzip.GzipFile(filename, mode='wb', compresslevel=9, fileobj=f)
                buf = io.TextIOWrapper(gz)
                df.to_csv(buf, index=False, encoding='UTF_8')
                gz.flush()
                gz.close()
    

    TextIOWrapper is needed until this issue is fixed: https://github.com/pandas-dev/pandas/issues/19827

提交回复
热议问题