I have a file with the following path : D:/bar/クレイジー・ヒッツ!/foo.abc
I am parsing the path from a XML file and storing it in a variable called path in the form of file://localhost/D:/bar/クレイジー・ヒッツ!/foo.abc
Then, the following operations are being done :
path=path.strip()
path=path[17:] #to remove the file://localhost/ part
path=urllib.url2pathname(path)
path=urllib.unquote(path)
The error is :
IOError: [Errno 2] No such file or directory: 'D:\\bar\\\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81\\foo.abc'
Update 1 : I am using Python 2.7 on Windows 7
The path in your error is:
'\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
I think this is the UTF8 encoded version of your filename.
I've created a folder of the same name on Windows7 and placed a file called 'abc.txt' in it:
>>> a = '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
>>> os.listdir('.')
['?????\xb7???!']
>>> os.listdir(u'.') # Pass unicode to have unicode returned to you
[u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01']
>>>
>>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01'
>>> os.listdir(a.decode('utf8'))
[u'abc.txt']
So it seems that Duncan's suggestion of path.decode('utf8') does the trick.
Update
I can't test this for you, but I suggest that you try checking whether the path contains non-ascii before doing the .decode('utf8'). This is a bit hacky...
ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
path=path.strip()
path=path[17:] #to remove the file://localhost/ part
path=urllib.unquote(path)
if path.translate(ASCII_TRANS) != path: # Contains non-ascii
path = path.decode('utf8')
path=urllib.url2pathname(path)
Provide the filename as a unicode string to the open call.
How do you produce the filename?
if provided as a constant by you
Add a line near the beginning of your script:
# -*- coding: utf8 -*-
Then, in a UTF-8 capable editor, set path to the unicode filename:
path = u"D:/bar/クレイジー・ヒッツ!/foo.abc"
read from a list of directory contents
Retrieve the contents of the directory using a unicode dirspec:
dir_files= os.listdir(u'.')
read from a text file
Open the filename-containing-file using codecs.open to read unicode data from it. You need to specify the encoding of the file (because you know what is the “default windows charset” for non-Unicode applications on your computer).
in any case
Do a:
path= path.decode("utf8")
before opening the file; substitute the correct encoding if not "utf8".
Here's some interesting stuff from the documentation:
sys.getfilesystemencoding()
Return the name of the encoding used to convert Unicode filenames into system file names, or None if the system default encoding is used. The result value depends on the operating system: On Mac OS X, the encoding is 'utf-8'. On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed. On Windows NT+, file names are Unicode natively, so no conversion is performed. getfilesystemencoding() still returns 'mbcs', as this is the encoding that applications should use when they explicitly want to convert Unicode strings to byte strings that are equivalent when used as file names. On Windows 9x, the encoding is 'mbcs'.
New in version 2.3.
If I understand this correctly, you should pass the file name as unicode:
f = open(unicode(path, encoding))
来源:https://stackoverflow.com/questions/5974585/python-not-able-to-open-file-with-non-english-characters-in-path