Method to substitute foreign for English characters in Java?

核能气质少年 提交于 2019-12-06 05:14:12

A really nice way to do it is using the replaceEach() method from the StringUtils class in Apache Commons Lang 2.4.

String text = "Je prends une thé chaud, s'il vous plaît";
String[] search = new String[] {"é", "î", "è"};
String[] replace = new String[] {"e", "i", "e"};
String newText = StringUtils.replaceEach(text, 
                search, 
                replace);

Results in

Je prends une the chaud, s'il vous plait

There's no method that works identically to the PHP one in the standard API, though there may be something in Apache Commons. You could do it by replacing the characters individually:

s = s.replace('é','e').replace('î', 'i').replace('è', 'e');

A more sophisticated method that does not require you to enumerate the characters to substitute (and is thus more likely not to miss anything) but does require a loop (which will happen anyway internally, whatever method you use) would be to use java.text.Normalizer to separate letters and diacritics and then strip out everything with a character type of Character.MODIFIER_LETTER.

I'm not a Java guy, but I'd recommend a generic solution using the Normalizer class to decompose accented characters and then remove the Unicode "COMBINING" characters.

cletus

You're going to have to do a loop:

String text = "Je prends une thé chaud, s'il vous plaît";
Map<Character, String> replace = new HashMap<Character, String>();
replace.put('é', "e");
replace.put('î', "i");
replace.put('è', "e");
StringBuilder s = new StringBuilder();
for (int i=0; i<text.length(); i++) {
  char c = text.charAt(i);
  String rep = replace.get(c);
  if (rep == null) {
    s.append(c);
  } else {
    s.append(rep);
  }
}
text = s.toString();

Note: Some characters are replaced with multiple characters. In German, for example, u-umlaut is converted to "ue".

Edit: Made it much more efficient.

There's no standard method as far as I know, but here's a class that does what you want:

http://www.javalobby.org/java/forums/t19704.html

You'll need a loop.

An efficient solution would be something like the following:

    Map<Character, Character> map = new HashMap<Character, Character>();
    map.put('é', 'e');
    map.put('î', 'i');
    map.put('è', 'e');

    StringBuilder b = new StringBuilder();
    for (char c : text.toCharArray())
    {
        if (map.containsKey(c))
        {
            b.append(map.get(c));
        }
        else
        {
            b.append(c);
        }
    }
    String result = b.toString();

Of course in a real program you would encapsulate both the construction of the map and the replacement in their respective methods.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!