问题
With this code:
test.py
import sys
import codecs
sys.stdout = codecs.getwriter('utf-16')(sys.stdout)
print "test1"
print "test2"
Then I run it as:
test.py > test.txt
In Python 2.6 on Windows 2000, I'm finding that the newline characters are being output as the byte sequence \x0D\x0A\x00
which of course is wrong for UTF-16.
Am I missing something, or is this a bug?
回答1:
Try this:
import sys
import codecs
if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
class CRLFWrapper(object):
def __init__(self, output):
self.output = output
def write(self, s):
self.output.write(s.replace("\n", "\r\n"))
def __getattr__(self, key):
return getattr(self.output, key)
sys.stdout = CRLFWrapper(codecs.getwriter('utf-16')(sys.stdout))
print "test1"
print "test2"
回答2:
The newline translation is happening inside the stdout file. You're writing "test1\n" to sys.stdout (a StreamWriter). StreamWriter translates this to "t\x00e\x00s\x00t\x001\x00\n\x00", and sends it to the real file, the original sys.stderr.
That file doesn't know that you've converted the data to UTF-16; all it knows is that any \n values in the output stream need to be converted to \x0D\x0A, which results in the output you're seeing.
回答3:
I've found two solutions so far, but not one that gives output of UTF-16 with Windows-style line endings.
First, to redirect Python print
statements to a file with UTF-16 encoding (output Unix style line-endings):
import sys
import codecs
sys.stdout = codecs.open("outputfile.txt", "w", encoding="utf16")
print "test1"
print "test2"
Second, to redirect to stdout
with UTF-16 encoding, without line-ending translation corruption (output Unix style line-endings) (thanks to this ActiveState recipe):
import sys
import codecs
sys.stdout = codecs.getwriter('utf-16')(sys.stdout)
if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
print "test1"
print "test2"
来源:https://stackoverflow.com/questions/1169742/bug-with-python-utf-16-output-and-windows-line-endings