问题
I have a python application that encodes some objects to json, passes the json string to another program, and then reads in a possibly modified version of that json string.
I need to check that what's changed with the json encoded objects. However, I'm having trouble with re-encoding non-ascii characters. For example:
x = {'\xe2': None} # a dict with non-ascii keys
y = json.dumps(x,ensure_ascii=False)
y
#> '{"\xe2": null}'
works just fine, but when I try to load the json, I get:
json.loads(y)
#> UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 0
json.loads(y.decode('utf-8','ignore'))
#> "{u'': None}"
json.loads(y.decode('utf-8','replace'))
#> {u'\ufffd': None}
and unfortunately '\xe2' in {u'\ufffd': None} evaluates to False
I'm willing to bet there is a simple solution, but all my googling and searching on SO has failed to find an adequate solution.
回答1:
The easiest way to fix this is to go to the thing that is generating this dict and properly encode things there as utf-8. Currently, your keys are encoded as CP-1252.
print('\xe2'.decode('cp1252'))
â
If you can't fix at the source, you'll need to do some post-processing.
d = {'\xe2': None}
fixed_d = {k.decode('cp1252'):v for k,v in d.iteritems()}
json.dumps(fixed_d)
Out[24]: '{"\\u00e2": null}'
json_dict_with_unicode_keys = json.dumps(fixed_d)
json_dict_with_unicode_keys
Out[32]: '{"\\u00e2": null}'
print(json.loads(json_dict_with_unicode_keys).keys()[0])
â
(Some of the content of this answer assumes you're on python 2, there are differences in unicode handling in py3)
来源:https://stackoverflow.com/questions/30444811/encoding-and-then-decoding-json-with-non-ascii-characters-in-python-2-7