I am trying to read a gunzipped file (.gz) in python and am having some trouble.
I used the gzip module to read it but the file is encoded as a utf-8 text file so ev
I don't see why this should be so hard.
What are you doing exactly? Please explain "eventually it reads an invalid character".
It should be as simple as:
import gzip
fp = gzip.open('foo.gz')
contents = fp.read() # contents now has the uncompressed bytes of foo.gz
fp.close()
u_str = contents.decode('utf-8') # u_str is now a unicode string
This answer works for Python2 in Python3, please see @SeppoEnarvi 's answer at https://stackoverflow.com/a/19794943/610569 (it uses the rt mode for gzip.open.