问题
Can someone explain to me this odd thing:
When in python shell I type the following Cyrillic string:
>>> print 'абвгд'
абвгд
but when I type:
>>> print u'абвгд'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)
Since the first tring came out correctly, I reckon my OS X terminal can represent unicode, but it turns out it can't in the second case. Why ?
回答1:
>>> print 'абвгд'
абвгд
When you type in some characters, your terminal decides how these characters are represented to the application. Your terminal might give the characters to the application encoded as utf-8, ISO-8859-5 or even something that only your terminal understands. Python gets these characters as some sequence of bytes. Then python prints out these bytes as they are, and your terminal interprets them in some way to display characters. Since your terminal usually interprets the bytes the same way as it encoded them before, everything is displayed like you typed it in.
>>> u'абвгд'
Here you type in some characters that arrive at the python interpreter as a sequence of bytes, maybe encoded in some way by the terminal. With the u prefix python tries to convert this data to unicode. To do this correctly python has to known what encoding your terminal uses. In your case it looks like Python guesses your terminals encoding would be ASCII, but the received data doesn't match that, so you get an encoding error.
The straight forward way to create unicode strings in an interactive session would therefore be something like this this:
>>> us = 'абвгд'.decode('my-terminal-encoding')
In files you can also specify the encoding of the file with a special mode line:
# -*- encoding: ISO-8859-5 -*-
us = u'абвгд'
For other ways to set the default input encoding you can look at sys.setdefaultencoding(...) or sys.stdin.encoding.
回答2:
As of Python 2.6, you can use the environment variable PYTHONIOENCODING to tell Python that your terminal is UTF-8 capable. The easiest way to make this permanent is by adding the following line to your ~/.bash_profile:
export PYTHONIOENCODING=utf-8
回答3:
In addition to ensuring your OS X terminal is set to UTF-8, you may wish to set your python sys default encoding to UTF-8 or better. Create a file in /Library/Python/2.5/site-packages called sitecustomize.py. In this file put:
import sys
sys.setdefaultencoding('utf-8')
The setdefaultencoding method is available only by the site module, and is removed from the sys namespace once startup has completed. As such, you'll need to start a new python interpreter for the change to take effect. You can verify the current default coding at any time after startup with sys.getdefaultencoding().
If the characters aren't already unicode and you need to convert them, use the decode method on a string in order to decode the text from some other charset into unicode... best to specify which charset:
s = 'абвгд'.decode('some_cyrillic_charset') # makes the string unicode
print s.encode('utf-8') # transform the unicode into utf-8, then print it
回答4:
Also, make sure the terminal encoding is set to Unicode/UTF-8 (and not ascii, which seems to be your setting):
http://www.rift.dk/news.php?item.7.6
回答5:
A unicode object needs to be encoded before it can be displayed on some consoles. Try
u'абвгд'.encode()
instead to encode the unicode to a string object (most likely using utf8 as a default encoding, but depends on your python config)
回答6:
'абвгд' is not a unicode string
u'абвгд' is a unicode string
You cannot print unicode strings without encoding them. When you are dealing with strings in your application you want to make sure that any input is decoded and any output in encoded. This way your application will deal only with unicode strings internally and output strings in UTF8.
For reference:
>>> 'абвгд'.decode('utf8') == u'абвгд'
>>> True
来源:https://stackoverflow.com/questions/918294/python-unicode-in-mac-os-x-terminal