Python Automatically ignore unicode string

ⅰ亾dé卋堺 提交于 2019-12-12 01:26:39

问题


I've been searching to automatically import some files but since I'm on Windows i got the unicode error (because of the "C:\Users\..."). I've been looking to correct this error and found some hints (using r"MyString" or u"MyString" for raw and unicode strings) and I have been directed to this page (https://docs.python.org/3/howto/unicode.html).

But since my problem is about a GUI interface to automatically import some files, I haven't figured out the way to do it.

I'll leave you my hints right here :

 file = file.replace('\\', '//')

 file = r"MyFilePath" 

 file = u"MyFilePath" 

 file = os.path.abspath("MyFilePath") 

 file = "MyFilePath".decode('latin1')
 """ isn't correct because a string has no attribute 'decode' of course """ 

One of those two seems to be nice but I don't know how to let python understands that I want to copy the path behind the r or the u.

Or is there a way to tell Python :

file = StopThinkingWithUnicode("MyFilePath")

I've also see this link (Deal with unicode usernames in python mkdtemp) but doesn't work neither (I've corrected the print() function because of the Python2.7 write and I'm on 3.5)

I've forgotten to post the traceback so there it is :

  MyFilePath = "C:\Users\MyUser\Desktop\Projet\05_Statistiques\Data\MyFileName.xlsx"
  File "<ipython-input-13-d8c2e72a6d3f>", line 1
  MyFilePath = "C:\Users\MyUser\Desktop\Projet\05_Statistiques\Data\MyFileName.xlsx"
            ^
  SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

Could someone help me with me some hints or link? Thank for your help.

PS : I've tried setting at the first line of the script :

 # -*- coding: latin-1 -*- 

(I have *.xl , *.csv, *.sas7bdat, *.txt files)


回答1:


That's a very frequent issue with windows paths. I suspect that people stumble upon it, and figure out a way by putting the "annoying" lowercase letters matching escape sequences (\n,\t,\b,\a,\v,\x ...) in upper case. It works, except for \U (which is unicode escape sequence) and \N .

The real solution is to use raw prefix to treat backslashes literally:

MyFilePath = r"C:\Users\MyUser\Desktop\Projet\05_Statistiques\Data\MyFileName.xlsx"
             ^

EDIT: my theory about "bug avoidance by uppercase confirms. Check the path in this question: Largest number of rows in a csv python can handle?



来源:https://stackoverflow.com/questions/41876020/python-automatically-ignore-unicode-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!