Why is the output of print in python2 and python3 different with the same string?

后端 未结 2 744
再見小時候
再見小時候 2020-12-06 11:17

In python2:

$ python2 -c \'print \"\\x08\\x04\\x87\\x18\"\' | hexdump -C
00000000  08 04 87 18 0a                                    |.....|
00000005
         


        
2条回答
  •  死守一世寂寞
    2020-12-06 11:59

    Python 2's default string type is byte strings. Byte strings are written "abc" while Unicode strings are written u"abc".

    Python 3's default string type is Unicode strings. Byte strings are written as b"abc" while Unicode strings are written "abc" (u"abc" still works, too). since there are millions of Unicode characters, printing them as bytes requires an encoding (UTF-8 in your case) which requires multiple bytes per code point.

    First use a byte string in Python 3 to get the same Python 2 type. Then, because Python 3's print expects Unicode strings, use sys.stdout.buffer.write to write to the raw stdout interface, which expects byte strings.

    python3 -c 'import sys; sys.stdout.buffer.write(b"\x08\x04\x87\x18")'
    

    Note that if writing to a file, there are similar issues. For no encoding translation, open files in binary mode 'wb' and write byte strings.

提交回复
热议问题