I am trying to convert the html entity to unichar, the html entity is
when i try to do the following:
unichr(int(976918))
<
You can decode a string that has a Unicode escape (\U
followed by 8 hex digits, zero-padded) using the "unicode-escape"
encoding:
>>> s = "\\U%08x" % 976918
>>> s
'\\U000ee816'
>>> c = s.decode('unicode-escape')
>>> c
u'\U000ee816'
On a narrow build it's stored as a UTF-16 surrogate pair:
>>> list(c)
[u'\udb7a', u'\udc16']
This surrogate pair is processed correctly as a code unit during encoding:
>>> c.encode('utf-8')
'\xf3\xae\xa0\x96'
>>> '\xf3\xae\xa0\x96'.decode('utf-8')
u'\U000ee816'