How to convert some character into five digit unicode one in Python 3.3?

喜夏-厌秋 提交于 2019-11-28 08:27:21

问题


I'd like to convert some character into five digit unicode on in Python 3.3. For example,

import re
print(re.sub('a', u'\u1D15D', 'abc' ))

but the result is different from what I expected. Do I have to put the character itself, not codepoint? Is there a better way to handle five digit unicode characters?


回答1:


Python unicode escapes either are 4 hex digits (\uabcd) or 8 (\Uabcdabcd); for a codepoint beyond U+FFFF you need to use the latter (a capital U), make sure to left-fill with enough zeros:

>>> '\U0001D15D'
'𝅝'
>>> '\U0001D15D'.encode('unicode_escape')
b'\\U0001d15d'

(And yes, the U+1D15D codepoint (MUSICAL SYMBOL WHOLE NOTE) is in the above example, but your browser font may not be able to render it, showing a place-holder glyph (a box or question mark) instead.

Because you used a \uabcd escape, you replaced a in abc with two characters, the codepoint U+1D15 (, latin letter small capital ou), and the ASCII character D. Using a 32-bit unicode literal works:

>>> import re
>>> print(re.sub('a', '\U0001D15D', 'abc' ))
𝅝bc
>>> print(re.sub('a', u'\U0001D15D', 'abc' ).encode('unicode_escape'))
b'\\U0001d15dbc'

where again the U+1D15D codepoint could be displayed by your font as a placeholder glyph instead.




回答2:


By the way, you do not need the re module for this. You could use str.translate:

>>> 'abc'.translate({ord('a'):'\U0001D15D'})
'𝅝bc'


来源:https://stackoverflow.com/questions/14624765/how-to-convert-some-character-into-five-digit-unicode-one-in-python-3-3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!