Convert Unicode to ASCII without changing the string length (in Java)

后端 未结 5 1119
庸人自扰
庸人自扰 2020-12-01 16:58

What is the best way to convert a string from Unicode to ASCII without changing it\'s length (that is very important in my case)? Also the characters without any conversion

5条回答
  •  没有蜡笔的小新
    2020-12-01 17:25

    As stated in this answer, the following code should work:

        String s = "口水雞 hello Ä";
    
        String s1 = Normalizer.normalize(s, Normalizer.Form.NFKD);
        String regex = "[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+";
    
        String s2 = new String(s1.replaceAll(regex, "").getBytes("ascii"), "ascii");
    
        System.out.println(s2);
        System.out.println(s.length() == s2.length());
    

    Output is

    ??? hello A
    true
    

    So you first remove diactrical marks, the convert to ascii. Non-ascii characters will become question marks.

提交回复
热议问题