Gson Unicode characters conversion to Unicode character codes

前端未结

关注

 2  683

眼角桃花 2020-12-22 00:17

Check out my code below. I have a JSON string which contains Unicode character codes. I convert it to my Java object and then convert it back to JSON string. However, you ca

2条回答

星月不相逢 (楼主)

2020-12-22 01:02

Unfortunately, Gson does not seem to support it. All JSON input/output is concentrated in Gson (as of 2.8.0) JsonReader and JsonWriter respectively. JsonReader can read Unicode escapes using its private readEscapeCharacter method. However, unlike JsonReader, JsonWriter simply writes a string to the backing Writer instance making no character corrections for characters above 127 except \u2028 and   \u2029. The only thing, probably, you can do here is writing a custom escaping Writer so that you could emit Unicode escapes.

final class EscapedWriter
        extends Writer {

    private static final char[] hex = {
            '0', '1', '2', '3',
            '4', '5', '6', '7',
            '8', '9', 'a', 'b',
            'c', 'd', 'e', 'f'
    };

    private final Writer writer;

    // I/O components are usually implemented in not thread-safe manner
    // so we can save some time on constructing a single UTF-16 escape
    private final char[] escape = { '\\', 'u', 0, 0, 0, 0 };

    EscapedWriter(final Writer writer) {
        this.writer = writer;
    }

    // This implementation is not very efficient and is open for enhancements:
    // * constructing a single "normalized" buffer character array so that it could be passed to the downstream writer
    //   rather than writing characters one by one
    // * etc...
    @Override
    public void write(final char[] buffer, final int offset, final int length)
            throws IOException {
        for ( int i = offset; i < length; i++ ) {
            final int ch = buffer[i];
            if ( ch < 128 ) {
                writer.write(ch);
            } else {
                escape[2] = hex[(ch & 0xF000) >> 12];
                escape[3] = hex[(ch & 0x0F00) >> 8];
                escape[4] = hex[(ch & 0x00F0) >> 4];
                escape[5] = hex[ch & 0x000F];
                writer.write(escape);
            }
        }
    }

    @Override
    public void flush()
            throws IOException {
        writer.flush();
    }

    @Override
    public void close()
            throws IOException {
        writer.close();
    }

    // Some java.io.Writer subclasses may use java.lang.Object.toString() to materialize their accumulated state by design
    // so it has to be overridden and forwarded as well
    @Override
    public String toString() {
        return writer.toString();
    }

}

This writer is NOT well-tested, and does not respect \u2028 and \u2029. And then just configure the output destination when invoking the toJson method:

final String input = "{\"description\":\"Tikrovi\\u0161kai para\\u0161ytas k\\u016brinys\"}";
final Book book = gson.fromJson(input, Book.class);
final Writer output = new EscapedWriter(new StringWriter());
gson.toJson(book, output);
System.out.println(input);
System.out.println(output);

Output:

{"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}
{"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}

It's an interesting problem, and you might also raise an issue on google/gson to add a string writing configuration option - or at least to get some comments from the development team. I do believe they are very aware of such a behavior and made it work like that by design, however they could also shed some light on it (the only one I could think of now is that currently they have some more performance not making an additional transformation before writing a string, but it's a weak guess though).

0 讨论(0)

查看其它2个回答