Why is the same character compared twice by changing its case to UPPER and then to lower?

问题

The below code is in Class String in java. I don't understand why the characters from two different strings are compared twice. at first by doing upper case and if that fails by doing lower case.

My Question here is, is it required? If yes, why?

  public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                             = new CaseInsensitiveComparator();
        private static class CaseInsensitiveComparator
                implements Comparator<String>, java.io.Serializable {
            // use serialVersionUID from JDK 1.2.2 for interoperability
            private static final long serialVersionUID = 8575799808933029326L;

            public int compare(String s1, String s2) {
                int n1 = s1.length();
                int n2 = s2.length();
                int min = Math.min(n1, n2);
                for (int i = 0; i < min; i++) {
                    char c1 = s1.charAt(i);
                    char c2 = s2.charAt(i);
                    if (c1 != c2) {
                        c1 = Character.toUpperCase(c1);
                        c2 = Character.toUpperCase(c2);
                        if (c1 != c2) {
                            c1 = Character.toLowerCase(c1);
                            c2 = Character.toLowerCase(c2);
                            if (c1 != c2) {
                                // No overflow because of numeric promotion
                                return c1 - c2;
                            }
                        }
                    }
                }
                return n1 - n2;
            }
        }

回答1:

The issue might be more complex.

There are characters, where there are multiple lowercase codepoints for the same uppercase codepoint or vice versa. So to check for case insensitive match, you need to compare both upper and lowercase versions if one of them matches.

One example being

The Greek upper-case letter "Σ" has two different lower-case forms: "ς" in word-final position and "σ" elsewhere.

Source: Wikipedia

For upper case not equal but lowercase very much so, VGR supplied this excellent example:

A better example would be '\u0130' (İ) and 'I'. Passing them through toUpperCase leaves them unchanged (and therefore different), but passing them through toLowerCase results in identical character values

来源：https://stackoverflow.com/questions/34613630/why-is-the-same-character-compared-twice-by-changing-its-case-to-upper-and-then

标签

java

string

unicode

comparator