Python unicode string literals :: what's the difference between '\u0391' and u'\u0391'

与世无争的帅哥 提交于 2021-02-20 09:37:53

问题


I am using Python 2.7.3. Can anybody explain the difference between the literals:

'\u0391'

and:

u'\u0391'

and the different way they are echoed in the REPL below (especially the extra slash added to a1):

>>> a1='\u0391'
>>> a1
'\\u0391'
>>> type(a1)
<type 'str'>
>>> 
>>> a2=u'\u0391'
>>> a2
u'\u0391'
>>> type(a2)
<type 'unicode'>
>>> 

回答1:


You can only use unicode escapes (\uabcd) in a unicode string literal. They have no meaning in a byte string. A Python 2 Unicode literal (u'some text') is a different type of Python object from a python byte string ('some text').

It's like using \t versus \T; the former has meaning in python literals (it's interpreted as a tab character), the latter just means a backslash and a capital T (two characters).

To help understand the difference between Unicode and byte strings, please do read the Python Unicode HOWTO; I can also recommend the Joel Spolsky on Unicode article.

Note: in Python 3, the same differences apply, but 'some text' is a Unicode string literal, and b'some text' is the bytestring syntax.




回答2:


As opposed to C, in Python a string can be enclosed in simple quotes (') as well as double quotes (") -- leaving aside the triple-double quotes """.

Thus, '\u0391' is only a string containing the letters \, u, 0, 3, 9 and 1. When pretty printing this string, the \ is escaped via another \.

On the contrary, having a u in front makes the string to be considered Unicode and all escapes are evaluated. Thus, u'\u0391' is interpreted as "the Unicode string containing codepoint 0391" which is different from the above.



来源:https://stackoverflow.com/questions/14559444/python-unicode-string-literals-whats-the-difference-between-u0391-and-u

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!