JSONObject in org.json lib: utf-8 encoding issue

喜你入骨 提交于 2019-12-07 18:57:05

问题


I'm following the Unicode - How to get the characters right? post.

The only issue I have is with JSONObject encoding (I'm using org.json lib).

The issue arises when I put a string like àòùè쀀, for example, in a JSONObject.

System.out.println(entry.getValue());
JSONObject temp = new JSONObject();
temp.put("values", entry.getValue();
System.out.println(temp.toString());

I obtain àòùè쀀 and {"values":"àòùèì\u20ac\u20ac"} instead of {"values":"àòùè쀀"}.

EDIT

By passing from an hashtable to a jsonObject, the extended utf-8 encoding is used. For example, the hashtable

 {€èòàùì€ù=èòàù€ì, €òàèùì€=èòàù€ìç§$}

becomes the JSONObject

 {"\u20acòàèùì\u20ac":"èòàù\u20acìç§$","\u20acèòàùì\u20acù":"èòàù\u20acì"}

回答1:


They are exactly equal, with the Unicode escaping taking a bit more space. Like writing \u004a in Java is exactly the same as writing a. If correctness is your concern, it doesn't matter.

And it won't take considerable amount of extra space either unless most of your text is between 0x2000 - 0x20FF:

The following code escapes C0 and C1 control characters, but it also escapes 0x2000 - 0x20FF:

     if (c < ' ' || (c >= '\u0080' && c < '\u00a0')
                    || (c >= '\u2000' && c < '\u2100')) {

So any character between 0x2000 - 0x20FF and control characters are represented as unicode escapes. This makes sense for control characters because those are not allowed in JSON in their unescaped form.

As for 0x2000 - 0x20FF, I have no idea because the code is not commented. Every character unescaped in that range is valid JSON. Of course, 0x2028 and 0x2029 are not valid in Javascript (so this small detail makes JSON syntax not a subset of Javascript syntax), so it's good idea to escape those in JSON in case it is being used as JSONP which is Javascript really. But it is not apparent to me why the code escapes a whole range because just 2 characters in the range are illegal.



来源:https://stackoverflow.com/questions/15895709/jsonobject-in-org-json-lib-utf-8-encoding-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!