Curious about the implementation of CaseInsensitiveComparator [duplicate]

我只是一个虾纸丫 提交于 2019-11-30 21:47:31

问题


While I check the implementation of CaseInsensitiveComparator, which is private inner class of String, I found strange thing.

private static class CaseInsensitiveComparator
        implements Comparator<String>, java.io.Serializable {
    ...
    public int compare(String s1, String s2) {
        int n1 = s1.length();
        int n2 = s2.length();
        int min = Math.min(n1, n2);
        for (int i = 0; i < min; i++) {
            char c1 = s1.charAt(i);
            char c2 = s2.charAt(i);
            if (c1 != c2) {
                c1 = Character.toUpperCase(c1);
                c2 = Character.toUpperCase(c2);
                if (c1 != c2) {
                    c1 = Character.toLowerCase(c1);
                    c2 = Character.toLowerCase(c2);
                    if (c1 != c2) {
                        // No overflow because of numeric promotion
                        return c1 - c2;
                    }
                }
            }
        }
        return n1 - n2;
    }
    ...
}

What I'm curious is this: In the for loop, once you compare the upper cased characters, why you should compare the lower cased characters again? When Character.toUpperCase(c1) and Character.toUpperCase(c2) are different, is it possible that Character.toLowerCase(c1) and Character.toLowerCase(c2) are equal?

Couldn't it be simplified like this?

public int compare(String s1, String s2) {
    int n1 = s1.length();
    int n2 = s2.length();
    int min = Math.min(n1, n2);
    for (int i = 0; i < min; i++) {
        char c1 = s1.charAt(i);
        char c2 = s2.charAt(i);
        if (c1 != c2) {
            c1 = Character.toUpperCase(c1);
            c2 = Character.toUpperCase(c2);
            if (c1 != c2) {
                // No overflow because of numeric promotion
                return c1 - c2;
            }
        }
    }
    return n1 - n2;
}

Did I miss something?


回答1:


There are Unicode characters which are different in lowercase, but have the same uppercase form. For example the Greek letter Sigma - it has two lowercase forms (σ, and ς which is only used at the end of the word), but only one uppercase form (Σ).

I could not find any examples of the reverse, but if such a situation happened in the future, the current Java implementation is already prepared for this. Your version of the Comparator would definitely handle the Sigma case correctly.

You can find more information in the Case Mapping FAQ on the Unicode website.



来源:https://stackoverflow.com/questions/31696168/curious-about-the-implementation-of-caseinsensitivecomparator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!