Apply GZIP compression to a CSV in Python Pandas

泪湿孤枕 提交于 2019-12-31 09:12:10

问题


I am trying to write a dataframe to a gzipped csv in python pandas, using the following:

import pandas as pd
import datetime
import csv
import gzip

# Get data (with previous connection and script variables)
df = pd.read_sql_query(script, conn)

# Create today's date, to append to file
todaysdatestring = str(datetime.datetime.today().strftime('%Y%m%d'))
print todaysdatestring

# Create csv with gzip compression
df.to_csv('foo-%s.csv.gz' % todaysdatestring,
      sep='|',
      header=True,
      index=False,
      quoting=csv.QUOTE_ALL,
      compression='gzip',
      quotechar='"',
      doublequote=True,
      line_terminator='\n')

This just creates a csv called 'foo-YYYYMMDD.csv.gz', not an actual gzip archive.

I've also tried adding this:

#Turn to_csv statement into a variable
d = df.to_csv('foo-%s.csv.gz' % todaysdatestring,
      sep='|',
      header=True,
      index=False,
      quoting=csv.QUOTE_ALL,
      compression='gzip',
      quotechar='"',
      doublequote=True,
      line_terminator='\n')

# Write above variable to gzip
 with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as output:
   output.write(d)

Which fails as well. Any ideas?


回答1:


Using df.to_csv() with the keyword argument compression='gzip' should produce a gzip archive. I tested it using same keyword arguments as you, and it worked.

You may need to upgrade pandas, as gzip was not implemented until version 0.17.1, but trying to use it on prior versions will not raise an error, and just produce a regular csv. You can determine your current version of pandas by looking at the output of pd.__version__.




回答2:


It is done very easily with pandas

import pandas as pd

Write a pandas dataframe to disc as gunzip compressed csv

df.to_csv('dfsavename.csv.gz', compression='gzip')

Read from disc

df = pd.read_csv('dfsavename.csv.gz', compression='gzip')



回答3:


From documentation

import gzip
content = "Lots of content here"
with gzip.open('file.txt.gz', 'wb') as f:
    f.write(content)

with pandas

import gzip


content = df.to_csv(
      sep='|',
      header=True,
      index=False,
      quoting=csv.QUOTE_ALL,
      quotechar='"',
      doublequote=True,
      line_terminator='\n')

with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as f:
    f.write(content)

The trick here being that to_csv outputs text if you don't pass it a filename. Then you just redirect that text to gzip's write method.




回答4:


with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as f:
    f.write(df.to_csv(sep='|', index=False, quoting=csv.QUOTE_ALL))


来源:https://stackoverflow.com/questions/37193157/apply-gzip-compression-to-a-csv-in-python-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!