Gson Unicode characters conversion to Unicode character codes

前端 未结 2 683
眼角桃花
眼角桃花 2020-12-22 00:17

Check out my code below. I have a JSON string which contains Unicode character codes. I convert it to my Java object and then convert it back to JSON string. However, you ca

2条回答
  •  星月不相逢
    2020-12-22 01:02

    Unfortunately, Gson does not seem to support it. All JSON input/output is concentrated in Gson (as of 2.8.0) JsonReader and JsonWriter respectively. JsonReader can read Unicode escapes using its private readEscapeCharacter method. However, unlike JsonReader, JsonWriter simply writes a string to the backing Writer instance making no character corrections for characters above 127 except \u2028 and 

\u2029. The only thing, probably, you can do here is writing a custom escaping Writer so that you could emit Unicode escapes.

    final class EscapedWriter
            extends Writer {
    
        private static final char[] hex = {
                '0', '1', '2', '3',
                '4', '5', '6', '7',
                '8', '9', 'a', 'b',
                'c', 'd', 'e', 'f'
        };
    
        private final Writer writer;
    
        // I/O components are usually implemented in not thread-safe manner
        // so we can save some time on constructing a single UTF-16 escape
        private final char[] escape = { '\\', 'u', 0, 0, 0, 0 };
    
        EscapedWriter(final Writer writer) {
            this.writer = writer;
        }
    
        // This implementation is not very efficient and is open for enhancements:
        // * constructing a single "normalized" buffer character array so that it could be passed to the downstream writer
        //   rather than writing characters one by one
        // * etc...
        @Override
        public void write(final char[] buffer, final int offset, final int length)
                throws IOException {
            for ( int i = offset; i < length; i++ ) {
                final int ch = buffer[i];
                if ( ch < 128 ) {
                    writer.write(ch);
                } else {
                    escape[2] = hex[(ch & 0xF000) >> 12];
                    escape[3] = hex[(ch & 0x0F00) >> 8];
                    escape[4] = hex[(ch & 0x00F0) >> 4];
                    escape[5] = hex[ch & 0x000F];
                    writer.write(escape);
                }
            }
        }
    
        @Override
        public void flush()
                throws IOException {
            writer.flush();
        }
    
        @Override
        public void close()
                throws IOException {
            writer.close();
        }
    
        // Some java.io.Writer subclasses may use java.lang.Object.toString() to materialize their accumulated state by design
        // so it has to be overridden and forwarded as well
        @Override
        public String toString() {
            return writer.toString();
        }
    
    }
    

    This writer is NOT well-tested, and does not respect \u2028 and \u2029. And then just configure the output destination when invoking the toJson method:

    final String input = "{\"description\":\"Tikrovi\\u0161kai para\\u0161ytas k\\u016brinys\"}";
    final Book book = gson.fromJson(input, Book.class);
    final Writer output = new EscapedWriter(new StringWriter());
    gson.toJson(book, output);
    System.out.println(input);
    System.out.println(output);
    

    Output:

    {"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}
    {"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}

    It's an interesting problem, and you might also raise an issue on google/gson to add a string writing configuration option - or at least to get some comments from the development team. I do believe they are very aware of such a behavior and made it work like that by design, however they could also shed some light on it (the only one I could think of now is that currently they have some more performance not making an additional transformation before writing a string, but it's a weak guess though).

提交回复
热议问题