I currently have a script that reads the existing version of a csv saved to s3, combines that with the new rows in the pandas dataframe, and then writes that directly back t
If you want streaming writes (to not hold (de)compressed CSV in memory), you can do this:
import s3fs
import io
import gzip
def write_df_to_s3(df, filename, path):
s3 = s3fs.S3FileSystem(anon=False)
with s3.open(path, 'wb') as f:
gz = gzip.GzipFile(filename, mode='wb', compresslevel=9, fileobj=f)
buf = io.TextIOWrapper(gz)
df.to_csv(buf, index=False, encoding='UTF_8')
gz.flush()
gz.close()
TextIOWrapper is needed until this issue is fixed: https://github.com/pandas-dev/pandas/issues/19827