I\'m generating a XML file to make payments and I have a constraint for user\'s full names. That param only accept alphabet characters (a-ZAZ) + whitespaces to separe names
You can use this removeAccents method with a later replaceAll with [^A-Za-z ]:
public static String removeAccents(String text) {
return text == null ? null :
Normalizer.normalize(text, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
The
Normalizerdecomposes the original characters into a combination of a base character and a diacritic sign (this could be multiple signs in different languages).á,éandíhave the same sign:0301for marking the'accent.The
\p{InCombiningDiacriticalMarks}+regular expression will match all such diacritic codes and we will replace them with an empty string.
And in the caller:
String original = "Carmen López-Delina Santos";
String res = removeAccents(original).replaceAll("[^A-Za-z ]", "");
System.out.println(res);
See IDEONE demo
You can first use a Normalizer and then remove the undesired characters:
String input = "Carmen López-Delina Santos";
String withoutAccent = Normalizer.normalize(input, Normalizer.Form.NFD);
String output = withoutAccent.replaceAll("[^a-zA-Z ]", "");
System.out.println(output); //prints Carmen LopezDelina Santos
Note that this may not work for all and any non-ascii letters in any language - if such a case is encountered the letter would be deleted. One such example is the Turkish i.
The alternative in that situation is probably to list all the possible letters and their replacement...