Reading utf-8 characters from a gzip file in python

后端 未结 5 1136
执念已碎
执念已碎 2020-12-14 07:38

I am trying to read a gunzipped file (.gz) in python and am having some trouble.

I used the gzip module to read it but the file is encoded as a utf-8 text file so ev

5条回答
  •  误落风尘
    2020-12-14 07:54

    I don't see why this should be so hard.

    What are you doing exactly? Please explain "eventually it reads an invalid character".

    It should be as simple as:

    import gzip
    fp = gzip.open('foo.gz')
    contents = fp.read() # contents now has the uncompressed bytes of foo.gz
    fp.close()
    u_str = contents.decode('utf-8') # u_str is now a unicode string
    

    EDITED

    This answer works for Python2 in Python3, please see @SeppoEnarvi 's answer at https://stackoverflow.com/a/19794943/610569 (it uses the rt mode for gzip.open.

提交回复
热议问题