I have found some unexpected differences when printing, via print
, a string with a family emoji directly, and when it's in a list. The below program
family = '👨👩👧👧'
print(family)
print([family])
outputs
👨👩👧👧
['👨\u200d👩\u200d👧\u200d👧']
when I would expect it to output
👨👩👧👧
['👨👩👧👧']
Another case of multi-character glyph
man_with_skin_tone_modifier = '👨🏿'
print(man_with_skin_tone_modifier)
print([man_with_skin_tone_modifier])
outputs as I expect:
👨🏿
['👨🏿']
Why is this?
Context: I discovered this while writing the answer for https://stackoverflow.com/a/49930688/1319998 , and it's using Python 3.6.5 on OS X.
The difference, as noted in the comments, is that print(family)
calls the str.__str__
method, while print([family])
calls str.__repr__
, which escapes non-printable unicode characters.
The
print
function converts its (non-keyword) arguments usingstr
.Calling
str
on containers (generally) callsrepr
on their items. Mostly this is because strings inside a container would too easily disturb the presentation of the container itself (e.g. with newlines). A PEP to change this was raised around the release of Python 3 but quickly rejected.Calling
repr
on strings escapes any non-printable characters (but, as of Python 3, preserves other non-ASCII Unicode characters): see PEP-3138 and the description of str.isprintable
Return true if all characters in the string are printable or the string is empty, false otherwise. Nonprintable characters are those characters defined in the Unicode character database as “Other” or “Separator”, excepting the ASCII space (0x20) which is considered printable. (Note that printable characters in this context are those which should not be escaped when repr() is invoked on a string. It has no bearing on the handling of strings written to sys.stdout or sys.stderr.)
The CPython implementation can be found here (search for the unicode_repr function).
来源:https://stackoverflow.com/questions/49958287/printing-family-emoji-with-u200d-zero-width-joiner-directly-vs-via-list