Escaping unicode strings in python

前端 未结 4 1465

In python these three commands print the same emoji:

print \"\\xF0\\x9F\\x8C\\x80\"
         


        
4条回答
  •  青春惊慌失措
    2021-01-03 05:00

    The first one is a byte string:

    >>> "\xF0\x9F\x8C\x80".decode('utf8')
    u'\U0001f300'
    

    The u"\ud83c\udf00" one is the UTF16 version (four digit unicode escape)

    The u"\U0001F300" one is actual index of the codepoint.


    But how do the numbers relate? This is the difficult question. It's defined by the encoding and there is no obvious relationship. To give you an idea, here is an example of "manually" encoding the codepoint at index 0x1F300 into UTF-8:

    The cyclone character

提交回复
热议问题