问题
Basically, I need to sort a list of Strings based on a very specific criteria, however, it's not so specific that I believe it needs its own comparator.
Collections.Sort gets me about 95% the way there as most of its natural sorting, however, for strings like:
"-&4" and "%B", it will prioritize "%B" over "-&4".
What I'd like is it to be sorted on the first alphanumeric character, so it would be comparing:
"4" and "B", putting:
"-&4" first then "%B".
Doing a replaceall on special characters can't really work because I have to retain the integrity of the string, and I went down a rabbit hole of replacing all, sorting to generate a sort position then try to re-sort the non-replaced list to no avail (also seems overkill).
I've spent the past 4 hours googling this and surprised it's such a novel situation. Most solutions come with a replaceall on non-alphanumeric characters, but I'd need to retain the integrity of the original string.
Apologies if this is confusing verbiage as well.
回答1:
it's not so specific that I believe it needs its own comparator
If you don't supply a Comparator, the strings are sorted by their natural order. Since that's not what you want, you definitely need to supply a comparator, and since there is no built-in comparator doing exactly what you want, you do need to supply a custom comparator.
The code below create a custom comparator using a helper method, and a lambda expression or a method reference. Just because you don't create your own class implementing Comparator, doesn't mean you're not creating your own comparator.
To sort by only alphanumeric characters, ignoring spaces and special characters, you can do it like this:
List<String> list = ...
Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
list.sort(Comparator.comparing(s -> p.matcher(s).replaceAll("")));
If the list is large, you'd likely want to improve performance by caching the normalized string that the sort is using.
List<String> list = ...
Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
Map<String, String> normalized = list.stream()
.collect(Collectors.toMap(s -> s, s -> p.matcher(s).replaceAll(""), (a, b) -> a));
list.sort(Comparator.comparing(normalized::get));
Regex explained
\p{L}matches all characters in Unicode category "Letter".\p{N}matches all characters in Unicode category "Number".[^\p{L}\p{N}]matches all characters that are not "Letter" or "Number"."[^\\p{L}\\p{N}]+"is the Java encoded literal matching one or more of those characters.
来源:https://stackoverflow.com/questions/63077353/sorting-a-list-of-strings-by-ignoring-not-replacing-non-alphanumeric-character