Java String encoding (UTF-8)

前端 未结 2 2000
猫巷女王i
猫巷女王i 2020-12-05 10:46

I have come across this line of legacy code, which I am trying to figure out:

String newString = new String(oldString.getBytes(\"UTF-8\"), \"UTF-8\"));
         


        
2条回答
  •  南方客
    南方客 (楼主)
    2020-12-05 11:53

    This could be complicated way of doing

    String newString = new String(oldString);
    

    This shortens the String is the underlying char[] used is much longer.

    However more specifically it will be checking that every character can be UTF-8 encoded.

    There are some "characters" you can have in a String which cannot be encoded and these would be turned into ?

    Any character between \uD800 and \uDFFF cannot be encoded and will be turned into '?'

    String oldString = "\uD800";
    String newString = new String(oldString.getBytes("UTF-8"), "UTF-8");
    System.out.println(newString.equals(oldString));
    

    prints

    false
    

提交回复
热议问题