Why can some ASCII characters not be expressed in the form '\uXXXX' in Java source code?

前端 未结 5 1612
被撕碎了的回忆
被撕碎了的回忆 2020-12-13 12:19

I stumbled over this (again) today:

class Test {
    char ok = \'\\n\';
    char okAsWell = \'\\u000B\';
    char error = \'\\u000A\';
}

It

5条回答
  •  清歌不尽
    2020-12-13 13:04

    Unicode characters are replaced by their value, so your line is replaced by the compiler with:

    char error = '
    ';
    

    which is not a valid Java statement.

    This is dictated by the Language Specification:

    A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) of the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters.

    This can lead to surprising stuff, for example, this is a valid Java program (it contains hidden unicode characters) - courtesy of Peter Lawrey:

    public static void main(String[] args) {
        for (char c‮h = 0; c‮h < Character.MAX_VALUE; c‮h++) {
            if (Character.isJavaIdentifierPart(c‮h) && !Character.isJavaIdentifierStart(c‮h)) {
                System.out.printf("%04x <%s>%n", (int) c‮h, "" + c‮h);
            }
        }
    }
    

提交回复
热议问题