Write pandas dataframe as compressed CSV directly to Amazon s3 bucket?

后端未结

关注

 3  444

醉酒成梦 2021-01-02 23:09

I currently have a script that reads the existing version of a csv saved to s3, combines that with the new rows in the pandas dataframe, and then writes that directly back t

3条回答

孤独总比滥情好 (楼主)

2021-01-02 23:56

If you want streaming writes (to not hold (de)compressed CSV in memory), you can do this:

import s3fs
import io
import gzip

    def write_df_to_s3(df, filename, path):
        s3 = s3fs.S3FileSystem(anon=False)
        with s3.open(path, 'wb') as f:
            gz = gzip.GzipFile(filename, mode='wb', compresslevel=9, fileobj=f)
            buf = io.TextIOWrapper(gz)
            df.to_csv(buf, index=False, encoding='UTF_8')
            gz.flush()
            gz.close()

TextIOWrapper is needed until this issue is fixed: https://github.com/pandas-dev/pandas/issues/19827

0 讨论(0)

查看其它3个回答