问题
Basically, I need to sort a list of Strings based on a very specific criteria, however, it's not so specific that I believe it needs its own comparator.
Collections.Sort gets me about 95% the way there as most of its natural sorting, however, for strings like:
"-&4" and "%B", it will prioritize "%B" over "-&4".
What I'd like is it to be sorted on the first alphanumeric character, so it would be comparing:
"4" and "B", putting:
"-&4" first then "%B".
Doing a replaceall on special characters can't really work because I have to retain the integrity of the string, and I went down a rabbit hole of replacing all, sorting to generate a sort position then try to re-sort the non-replaced list to no avail (also seems overkill).
I've spent the past 4 hours googling this and surprised it's such a novel situation. Most solutions come with a replaceall on non-alphanumeric characters, but I'd need to retain the integrity of the original string.
Apologies if this is confusing verbiage as well.
回答1:
it's not so specific that I believe it needs its own comparator
If you don't supply a Comparator
, the strings are sorted by their natural order. Since that's not what you want, you definitely need to supply a comparator, and since there is no built-in comparator doing exactly what you want, you do need to supply a custom comparator.
The code below create a custom comparator using a helper method, and a lambda expression or a method reference. Just because you don't create your own class implementing Comparator
, doesn't mean you're not creating your own comparator.
To sort by only alphanumeric characters, ignoring spaces and special characters, you can do it like this:
List<String> list = ...
Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
list.sort(Comparator.comparing(s -> p.matcher(s).replaceAll("")));
If the list is large, you'd likely want to improve performance by caching the normalized string that the sort is using.
List<String> list = ...
Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
Map<String, String> normalized = list.stream()
.collect(Collectors.toMap(s -> s, s -> p.matcher(s).replaceAll(""), (a, b) -> a));
list.sort(Comparator.comparing(normalized::get));
Regex explained
\p{L}
matches all characters in Unicode category "Letter".\p{N}
matches all characters in Unicode category "Number".[^\p{L}\p{N}]
matches all characters that are not "Letter" or "Number"."[^\\p{L}\\p{N}]+"
is the Java encoded literal matching one or more of those characters.
来源:https://stackoverflow.com/questions/63077353/sorting-a-list-of-strings-by-ignoring-not-replacing-non-alphanumeric-character