问题
I am trying to write a filter function for my application that will take an input string and filter out all objects that don't match the given input in some way. The easiest way to do this would be to use String's contains method, i.e. just check if the object (the String variable in the object) contains the string specified in the filter, but this won't account for accents.
The objects in question are basically Persons, and the strings I am trying to match are names. So for example if someone searches for Joao I would expect Joáo to be included in the result set. I have already used the Collator class in my application to sort by name and it works well because it can do compare, i.e. using the UK Locale á comes before b but after a. But obvisouly it doesn't return 0 if you compare a and á because they are not equal.
So does anyone have any idea how I might be able to do this?
回答1:
Make use of java.text.Normalizer and a shot of regex to get rid of the diacritics.
public static String removeDiacriticalMarks(String string) {
return Normalizer.normalize(string, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
Which you can use as follows:
String value = "Joáo";
String comparisonMaterial = removeDiacriticalMarks(value); // Joao
回答2:
Collator does return 0 for a and á, if you configure it to ignore diacritics:
public boolean isSame(String a, String b) {
Collator insenstiveStringComparator = Collator.getInstance();
insenstiveStringComparator.setStrength(Collator.PRIMARY);
// Collator.PRIMARY also works, but is case senstive
return insenstiveStringComparator.compare(a, b) == 0;
}
isSame("a", "á") yields true now
回答3:
I have written a class for searching trough arabic texts by ignoring diacritic (NOT removing them). maybe you can get the idea or use it in some way.
DiacriticInsensitiveSearch.java
来源:https://stackoverflow.com/questions/2397804/java-string-searching-ignoring-accents