发表新帖

发表新帖

UnicodeDecodeError when performing os.walk

前端未结

关注

 6  1971

故里飘歌 2020-12-05 14:30

I am getting the error:

\'ascii\' codec can\'t decode byte 0x8b in position 14: ordinal not in range(128)

when trying to do os.walk. The er

6条回答

轻奢々 (楼主)

2020-12-05 15:03
I can reproduce the os.listdir() behavior: os.listdir(unicode_name) returns undecodable entries as bytes on Python 2.7:
```
>>> import os
>>> os.listdir(u'.')
[u'abc', '<--\x8b-->']
```
Notice: the second name is a bytestring despite listdir()'s argument being a Unicode string.

A big question remains however - how can this be solved without resorting to this hack?

Python 3 solves undecodable bytes (using filesystem's character encoding) bytes in filenames via surrogateescape error handler (os.fsencode/os.fsdecode). See PEP-383: Non-decodable Bytes in System Character Interfaces:
```
>>> os.listdir(u'.')
['abc', '<--\udc8b-->']
```
Notice: both string are Unicode (Python 3). And surrogateescape error handler was used for the second name. To get the original bytes back:
```
>>> os.fsencode('<--\udc8b-->')
b'<--\x8b-->'
```
In Python 2, use Unicode strings for filenames on Windows (Unicode API), OS X (utf-8 is enforced) and use bytestrings on Linux and other systems.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题