possible to raise exception that includes non-english characters in python 2?

我怕爱的太早我们不能终老 提交于 2019-12-05 09:30:40

The behaviour depends on Python version and the environment. On Python 3 the character encoding error handler for sys.stderr is always 'backslashreplace':

from __future__ import unicode_literals, print_function
import sys

s = 'unicode "\u2323" smile'
print(s)
print(s, file=sys.stderr)
try:
    raise RuntimeError(s)
except Exception as e:
    print(e.args[0])
    print(e.args[0], file=sys.stderr)
    raise

python3:

$ PYTHONIOENCODING=ascii:ignore python3 raise_unicode.py
unicode "" smile
unicode "\u2323" smile
unicode "" smile
unicode "\u2323" smile
Traceback (most recent call last):
  File "raise_unicode.py", line 8, in <module>
    raise RuntimeError(s)
RuntimeError: unicode "\u2323" smile

python2:

$ PYTHONIOENCODING=ascii:ignore python2 raise_unicode.py
unicode "" smile
unicode "" smile
unicode "" smile
unicode "" smile
Traceback (most recent call last):
  File "raise_unicode.py", line 8, in <module>
    raise RuntimeError(s)
RuntimeError

That is on my system the error message is eaten on python2.

Note: on Windows you could try:

T:\> set PYTHONIOENCODING=ascii:ignore
T:\> python raise_unicode.py

For comparison:

$ python3 raise_unicode.py
unicode "⌣" smile
unicode "⌣" smile
unicode "⌣" smile
unicode "⌣" smile
Traceback (most recent call last):
  File "raise_unicode.py", line 8, in <module>
    raise RuntimeError(s)
RuntimeError: unicode "⌣" smile

This is how Python works. I believe what you are seeing is coming from traceback._some_string() in the Python core library. In that module, when a stack trace is done, the code in that method first tries to convert the message using str(), then if that raises an exception, converts the message using unicode(), then converts it to ascii using encode("ascii", "backslashreplace"). You are getting valid output, and everything is working correctly, my guess is that Python is doing it's best to pseudo-down convert the error message so that it will display without problems no matter the platform executing it. That is just the unicode codepoint for your character. It doesn't happen in your try/except block because this conversion is something specific to the mechanism that produces stack traces (such as in the event of uncaught exceptions).

In my case your example worked as it should, printing nice unicode.

But sometimes you have a lot of problems with exception stack printed without (or with escaped/backslashed) unicode characters. It is possible to overcome the obstacle and print normal messages.

Example of the problem with output (Python 2.7, linux):

# -*- coding: utf-8 -*-
desc = u'something bad with field ¾'
raise SyntaxError(desc.encode('utf-8', 'replace'))

It will print only truncated or screwed message:

~/.../sources/C_patch$ python SO.py 
Traceback (most recent call last):
  File "SO.py", line 25, in <module>
    raise SyntaxError(desc)
SyntaxError

To actually see the unaltered unicode, you can encode it to raw bytes and feed into exception object:

# -*- coding: utf-8 -*-
desc = u'something bad with field ¾'
raise SyntaxError(desc.encode('utf-8', 'replace'))

This time you will see the full message:

~/.../sources/C_patch$ python SO.py 
Traceback (most recent call last):
  File "SO.py", line 3, in <module>
    raise SyntaxError(desc.encode('utf-8', 'replace'))
SyntaxError: something bad with field ¾

You can do value.encode('utf-8', 'replace') in your constructor, if you like, but with system exception you will have to do it in the raise statement, like in the example.

The hint is taken from here: Overcoming frustration: Correctly using unicode in python2 (there are big library with many helpers, and all of them can be stripped down to the example above).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!