Python not able to open file with non-english characters in path

前端 未结 3 622

I have a file with the following path : D:/bar/クレイジー・ヒッツ!/foo.abc

I am parsing the path from a XML file and storing it in a variable called path in the

3条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-05 20:56

    The path in your error is:

    '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
    

    I think this is the UTF8 encoded version of your filename.

    I've created a folder of the same name on Windows7 and placed a file called 'abc.txt' in it:

    >>> a = '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
    >>> os.listdir('.')
    ['?????\xb7???!']
    >>> os.listdir(u'.') # Pass unicode to have unicode returned to you
    [u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01']
    >>> 
    >>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
    u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01'
    >>> os.listdir(a.decode('utf8'))
    [u'abc.txt']
    

    So it seems that Duncan's suggestion of path.decode('utf8') does the trick.


    Update

    I can't test this for you, but I suggest that you try checking whether the path contains non-ascii before doing the .decode('utf8'). This is a bit hacky...

    ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
    path=path.strip()
    path=path[17:] #to remove the file://localhost/  part
    path=urllib.unquote(path)
    if path.translate(ASCII_TRANS) != path: # Contains non-ascii
      path = path.decode('utf8')
    path=urllib.url2pathname(path)
    

提交回复
热议问题