I am trying to read a gunzipped file (.gz) in python and am having some trouble.
I used the gzip module to read it but the file is encoded as a utf-8 text file so ev
The above produced tons of decoding errors. I used this:
for line in io.TextIOWrapper(io.BufferedReader(gzip.open(filePath)), encoding='utf8', errors='ignore'):
...
I don't see why this should be so hard.
What are you doing exactly? Please explain "eventually it reads an invalid character".
It should be as simple as:
import gzip
fp = gzip.open('foo.gz')
contents = fp.read() # contents now has the uncompressed bytes of foo.gz
fp.close()
u_str = contents.decode('utf-8') # u_str is now a unicode string
This answer works for Python2
in Python3
, please see @SeppoEnarvi 's answer at https://stackoverflow.com/a/19794943/610569 (it uses the rt
mode for gzip.open
.
Maybe
import codecs
zf = gzip.open(fname, 'rb')
reader = codecs.getreader("utf-8")
contents = reader( zf )
for line in contents:
pass
In pythonic form (2.5 or greater)
from __future__ import with_statement # for 2.5, does nothing in 2.6
from gzip import open as gzopen
with gzopen('foo.gz') as gzfile:
for line in gzfile:
print line.decode('utf-8')
This is possible in Python 3.3:
import gzip
gzip.open('file.gz', 'rt', encoding='utf-8')
Notice that gzip.open() requires you to explicitly specify text mode ('t').