How to open an unicode text file inside a zip?

后端 未结 3 1717
不知归路
不知归路 2020-12-19 06:04

I tried

with zipfile.ZipFile(\"5.csv.zip\", \"r\") as zfile:
    for name in zfile.namelist():
        with zfile.open(name, \'rU\') as readFile:
                    


        
3条回答
  •  青春惊慌失措
    2020-12-19 06:34

    edit For Python 3, using io.TextIOWrapper as this answer describes is the best choice. The answer below could still be helpful for 2.x. I don't think anything below is actually incorrect even for 3.x, but io.TestIOWrapper is still better.

    If the file is utf-8, this will work:

    # the rest of the code as above, then:
    with zfile.open(name, 'rU') as readFile:
        line = readFile.readline().decode('utf8')
        # etc
    

    If you're going to be iterating over the file you can use codecs.iterdecode, but that won't work with readline().

    with zfile.open(name, 'rU') as readFile:
        for line in codecs.iterdecode(readFile, 'utf8'):
            print line
            # etc
    

    Note that neither approach is necessarily safe for multibyte encodings. For example, little-endian UTF-16 represents the newline character with the bytes b'\x0A\x00'. A non-unicode aware tool looking for newlines will split that incorrectly, leaving the null bytes on the following line. In such a case you'd have to use something that doesn't try to split the input by newlines, such as ZipFile.read, and then decode the whole byte string at once. This is not a concern for utf-8.

提交回复
热议问题