UnicodeDecodeError when performing os.walk

前端 未结 6 1971
故里飘歌
故里飘歌 2020-12-05 14:30

I am getting the error:

\'ascii\' codec can\'t decode byte 0x8b in position 14: ordinal not in range(128)

when trying to do os.walk. The er

6条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-05 15:03

    I can reproduce the os.listdir() behavior: os.listdir(unicode_name) returns undecodable entries as bytes on Python 2.7:

    >>> import os
    >>> os.listdir(u'.')
    [u'abc', '<--\x8b-->']
    

    Notice: the second name is a bytestring despite listdir()'s argument being a Unicode string.

    A big question remains however - how can this be solved without resorting to this hack?

    Python 3 solves undecodable bytes (using filesystem's character encoding) bytes in filenames via surrogateescape error handler (os.fsencode/os.fsdecode). See PEP-383: Non-decodable Bytes in System Character Interfaces:

    >>> os.listdir(u'.')
    ['abc', '<--\udc8b-->']
    

    Notice: both string are Unicode (Python 3). And surrogateescape error handler was used for the second name. To get the original bytes back:

    >>> os.fsencode('<--\udc8b-->')
    b'<--\x8b-->'
    

    In Python 2, use Unicode strings for filenames on Windows (Unicode API), OS X (utf-8 is enforced) and use bytestrings on Linux and other systems.

提交回复
热议问题