What is the best way to convert a string from Unicode to ASCII without changing it\'s length (that is very important in my case)? Also the characters without any conversion
As stated in this answer, the following code should work:
String s = "口水雞 hello Ä";
String s1 = Normalizer.normalize(s, Normalizer.Form.NFKD);
String regex = "[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+";
String s2 = new String(s1.replaceAll(regex, "").getBytes("ascii"), "ascii");
System.out.println(s2);
System.out.println(s.length() == s2.length());
Output is
??? hello A
true
So you first remove diactrical marks, the convert to ascii. Non-ascii characters will become question marks.