(unicode error) 'unicodeescape' codec can't decode bytes - string with '\u'

后端 未结 4 818
一个人的身影
一个人的身影 2020-12-31 00:00

Writing my code for Python 2.6, but with Python 3 in mind, I thought it was a good idea to put

from __future__ import unicode_literals

at t

4条回答
  •  不思量自难忘°
    2020-12-31 00:24

    AFAIK, all that from __future__ import unicode_literals does is to make all string literals of unicode type, instead of string type. That is:

    >>> type('')
    
    >>> from __future__ import unicode_literals
    >>> type('')
    
    

    But str and unicode are still different types, and they behave just like before.

    >>> type(str(''))
    
    

    Always, is of str type.

    About your r'\u' issue, it is by design, as it is equivalent to ru'\u' without unicode_literals. From the docs:

    When an 'r' or 'R' prefix is used in conjunction with a 'u' or 'U' prefix, then the \uXXXX and \UXXXXXXXX escape sequences are processed while all other backslashes are left in the string.

    Probably from the way the lexical analyzer worked in the python2 series. In python3 it works as you (and I) would expect.

    You can type the backslash twice, and then the \u will not be interpreted, but you'll get two backslashes!

    Backslashes can be escaped with a preceding backslash; however, both remain in the string

    >>> ur'\\u'
    u'\\\\u'
    

    So IMHO, you have two simple options:

    • Do not use raw strings, and escape your backslashes (compatible with python3):

      'H:\\unittests'

    • Be too smart and take advantage of unicode codepoints (not compatible with python3):

      r'H:\u005cunittests'

提交回复
热议问题