Python JSON and Unicode

☆樱花仙子☆ 提交于 2021-02-07 13:53:37

问题


Update :

I found the answer here : Python UnicodeDecodeError - Am I misunderstanding encode?

I needed to explicitly decode my incoming file into Unicode when I read it. Because it had characters that were neither acceptable ascii nor unicode. So the encode was failing when it hit these characters.

Original Question

So, I know there's something I'm just not getting here.

I have an array of unicode strings, some of which contain non-Ascii characters.

I want to encode that as json with

json.dumps(myList)

It throws an error

UnicodeDecodeError: 'ascii' codec can't decode byte 0xb4 in position 13: ordinal not in range(128)

How am I supposed to do this? I've tried setting the ensure_ascii parameter to both True and False, but neither fixes this problem.

I know I'm passing unicode strings to json.dumps. I understand that a json string is meant to be unicode. Why isn't it just sorting this out for me?

What am I doing wrong?

Update : Don Question sensibly suggests I provide a stack-trace. Here it is. :

Traceback (most recent call last):
  File "importFiles.py", line 69, in <module>
    x = u"%s" % conv
  File "importFiles.py", line 62, in __str__
    return self.page.__str__()
  File "importFiles.py", line 37, in __str__
    return json.dumps(self.page(),ensure_ascii=False)
  File "/usr/lib/python2.7/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 204, in encode
    return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb4 in position 17: ordinal not in range(128)

Note it's python 2.7, and the error is still occurring with ensure_ascii=False

Update 2 : Andrew Walker's useful link (in comments) leads me to think I can coerce my data into a convenient byte format before trying to json.encode it by doing something like :

data.encode("ascii","ignore")

Unfortunately that is throwing the same error.


回答1:


Try adding the argument: ensure_ascii = False. Also especially if asking unicode-related issues it's very helpful to add a longer (complete) traceback and stating which python-version you are using.

Citing the python-documentation: of version 2.6.7 :

"If ensure_ascii is False (default: True), then some chunks written to fp may be unicode instances, subject to normal Python str to unicode coercion rules. Unless fp.write() explicitly understands unicode (as in codecs.getwriter()) this is likely to cause an error."

So this proposal may cause new problems, but it fixed a similar problem i had. I fed the resulting unicode-String into a StringIO-object and wrrote this to a file.

Because of python 2.7 and sys.getdefaultencoding set to ascii the implicit conversion through the ''.join(chunks) statement of the json-standard-library will blow up if chunks is not ascii-encoded! You must ensure that any contained strings are converted to an ascii-compatible representation before-hand! You may try utf-8 encoded strings, but unicode-strings won't work if i'm not mistaken.



来源:https://stackoverflow.com/questions/9693699/python-json-and-unicode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!