Python Automatically ignore unicode string

问题

I've been searching to automatically import some files but since I'm on Windows i got the unicode error (because of the "C:\Users\..."). I've been looking to correct this error and found some hints (using r"MyString" or u"MyString" for raw and unicode strings) and I have been directed to this page (https://docs.python.org/3/howto/unicode.html).

But since my problem is about a GUI interface to automatically import some files, I haven't figured out the way to do it.

I'll leave you my hints right here :

 file = file.replace('\\', '//')

 file = r"MyFilePath" 

 file = u"MyFilePath" 

 file = os.path.abspath("MyFilePath") 

 file = "MyFilePath".decode('latin1')
 """ isn't correct because a string has no attribute 'decode' of course """

One of those two seems to be nice but I don't know how to let python understands that I want to copy the path behind the r or the u.

Or is there a way to tell Python :

file = StopThinkingWithUnicode("MyFilePath")

I've also see this link (Deal with unicode usernames in python mkdtemp) but doesn't work neither (I've corrected the print() function because of the Python2.7 write and I'm on 3.5)

I've forgotten to post the traceback so there it is :

  MyFilePath = "C:\Users\MyUser\Desktop\Projet\05_Statistiques\Data\MyFileName.xlsx"
  File "<ipython-input-13-d8c2e72a6d3f>", line 1
  MyFilePath = "C:\Users\MyUser\Desktop\Projet\05_Statistiques\Data\MyFileName.xlsx"
            ^
  SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

Could someone help me with me some hints or link? Thank for your help.

PS : I've tried setting at the first line of the script :

 # -*- coding: latin-1 -*-

(I have *.xl , *.csv, *.sas7bdat, *.txt files)

回答1:

That's a very frequent issue with windows paths. I suspect that people stumble upon it, and figure out a way by putting the "annoying" lowercase letters matching escape sequences (\n,\t,\b,\a,\v,\x ...) in upper case. It works, except for \U (which is unicode escape sequence) and \N .

The real solution is to use raw prefix to treat backslashes literally:

MyFilePath = r"C:\Users\MyUser\Desktop\Projet\05_Statistiques\Data\MyFileName.xlsx"
             ^

EDIT: my theory about "bug avoidance by uppercase confirms. Check the path in this question: Largest number of rows in a csv python can handle?

来源：https://stackoverflow.com/questions/41876020/python-automatically-ignore-unicode-string

标签

python

windows

user-interface

unicode

python-3.5