Java 8 UTF-8 encoding issue (java bug?)

后端 未结 3 1933
猫巷女王i
猫巷女王i 2020-11-29 08:15

There is an inconsistency when creating a String with UTF-8 encoding.

Run this code:

public static void encodingIssue() throws IOException {
    byte         


        
3条回答
  •  星月不相逢
    2020-11-29 08:57

    It is a property of the “Modified UTF-8” encoding to store surrogate pairs (or even unpaired chars of that range) like individual characters. And it’s an error if a decoder claiming to use standard UTF-8 uses “Modified UTF-8”. This seems to have been fixed with Java 8.

    You can reliably read such data using a method that is specified to use “Modified UTF-8”:

    ByteBuffer bb=ByteBuffer.allocate(array.length+2);
    bb.putShort((short)array.length).put(array);
    ByteArrayInputStream bis=new ByteArrayInputStream(bb.array());
    DataInputStream dis=new DataInputStream(bis);
    String str=dis.readUTF();
    

提交回复
热议问题