python unicode handling differences between print and sys.stdout.write

痞子三分冷 提交于 2019-12-03 12:22:49

This is due to a long-standing bug that was fixed in python-2.7, but too late to be back-ported to python-2.6.

The documentation states that when unicode strings are written to a file, they should be converted to byte strings using file.encoding. But this was not being honoured by sys.stdout, which instead was using the default unicode encoding. This is usually set to "ascii" by the site module, but it can be changed with sys.setdefaultencoding:

Python 2.6.7 (r267:88850, Aug 14 2011, 12:32:40) [GCC 4.6.2] on linux3
>>> a = u'\xa6\n'
>>> sys.stdout.write(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec cant encode character u'\xa6' ...
>>> reload(sys).setdefaultencoding('utf8')
>>> sys.stdout.write(a)
¦

However, a better solution might be to replace sys.stdout with a wrapper:

class StdOut(object):
    def write(self, string):
        if isinstance(string, unicode):
            string = string.encode(sys.__stdout__.encoding)
        sys.__stdout__.write(string)

>>> sys.stdout = StdOut()
>>> sys.stdout.write(a)
¦
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!