Handling UTF filenames in Python

爷,独闯天下 提交于 2019-12-10 09:43:21

问题


I've read quite a bit on the topic already, including what seems to be the definitive guide on this topic here: http://docs.python.org/howto/unicode.html

Perhaps for a more experienced developer, that guide may be enough. However, in my case, I'm more confused than when I started and still haven't resolved my issue.

I am trying to read filenames using os.walk() and to obtain certain information about the files (such as filesize) before writing that information to a text file. This works as long as I don't run into any files with filenames encoded in utf. When it hits a file with a utf encoded name I get an error like this one:

WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'Documents\\??.txt'

In that case, the file was named 唽咿.txt.

Here is how I have been trying to do it so far:

for (root, dirs, files) in os.walk(dirpath):
        for filename in files:
            filepath = os.path.join(root, filename)
            filesize = os.stat(filepath).st_size
            file = open(filepath, 'rb')
            stuff = get_stuff(filesize, file)
            file.close()

In case it matters, dirpath came from an earlier portion of code that amounts to 'dirpath = raw_input()'.

I've tried various things such as changing the filepath line to:

filepath = unicode(os.path.join(unicode(root), unicode(filename)))

But nothing I have tried has worked.

Here are my two questions:

  1. How can I get it to pass the correct filename to the os.stat() method so that I can get a correct response from it?

  2. My script needs to write some filenames into a text file that it may later want to read from. At that point it needs to be able to find the file based on what it just read from the text file. How do I write such filenames to a text file properly and then read from it properly later?


回答1:


Pass a unicode path to os.walk().

Changed in version 2.3: On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects.

source




回答2:


For those interested in the full solution:

dirpath = raw_input()

was changed to:

dirpath = raw_input().decode(sys.stdin.encoding)

That allowed for the argument being passed to os.walk() to be in unicode, causing the filenames it returned to also be in unicode.

To write these to or from a file (my second question) I used the codecs.open() functionality



来源:https://stackoverflow.com/questions/11545185/handling-utf-filenames-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!