Sorting a list of strings by ignoring (not replacing) non-alphanumeric characters, or by looking at the first alphanumeric character

北慕城南 提交于 2020-12-26 12:44:10

问题


Basically, I need to sort a list of Strings based on a very specific criteria, however, it's not so specific that I believe it needs its own comparator.

Collections.Sort gets me about 95% the way there as most of its natural sorting, however, for strings like:

"-&4" and "%B", it will prioritize "%B" over "-&4".

What I'd like is it to be sorted on the first alphanumeric character, so it would be comparing:

"4" and "B", putting:

"-&4" first then "%B".

Doing a replaceall on special characters can't really work because I have to retain the integrity of the string, and I went down a rabbit hole of replacing all, sorting to generate a sort position then try to re-sort the non-replaced list to no avail (also seems overkill).

I've spent the past 4 hours googling this and surprised it's such a novel situation. Most solutions come with a replaceall on non-alphanumeric characters, but I'd need to retain the integrity of the original string.

Apologies if this is confusing verbiage as well.


回答1:


it's not so specific that I believe it needs its own comparator

If you don't supply a Comparator, the strings are sorted by their natural order. Since that's not what you want, you definitely need to supply a comparator, and since there is no built-in comparator doing exactly what you want, you do need to supply a custom comparator.

The code below create a custom comparator using a helper method, and a lambda expression or a method reference. Just because you don't create your own class implementing Comparator, doesn't mean you're not creating your own comparator.


To sort by only alphanumeric characters, ignoring spaces and special characters, you can do it like this:

List<String> list = ...

Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
list.sort(Comparator.comparing(s -> p.matcher(s).replaceAll("")));

If the list is large, you'd likely want to improve performance by caching the normalized string that the sort is using.

List<String> list = ...

Pattern p = Pattern.compile("[^\\p{L}\\p{N}]+");
Map<String, String> normalized = list.stream()
        .collect(Collectors.toMap(s -> s, s -> p.matcher(s).replaceAll(""), (a, b) -> a));
list.sort(Comparator.comparing(normalized::get));

Regex explained

  • \p{L} matches all characters in Unicode category "Letter".
  • \p{N} matches all characters in Unicode category "Number".
  • [^\p{L}\p{N}] matches all characters that are not "Letter" or "Number".
  • "[^\\p{L}\\p{N}]+" is the Java encoded literal matching one or more of those characters.


来源:https://stackoverflow.com/questions/63077353/sorting-a-list-of-strings-by-ignoring-not-replacing-non-alphanumeric-character

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!