Why do Java octal escapes only go up to 255?

前端 未结 4 1563
太阳男子
太阳男子 2020-12-20 17:54

The Java language specification states that the escapes inside strings are the \"normal\" C ones like \\n and \\t, but they also specify octal esca

4条回答
  •  别那么骄傲
    2020-12-20 18:33

    If I can understand the rules (please correct me if I am wrong):

    \ OctalDigit
    Examples:
        \0, \1, \2, \3, \4, \5, \6, \7
    
    \ OctalDigit OctalDigit
    Examples:
        \00, \07, \17, \27, \37, \47, \57, \67, \77
    
    \ ZeroToThree OctalDigit OctalDigit
    Examples:
        \000, \177, \277, \367,\377
    

    \t, \n, \\ do not fall under OctalEscape rules; they must be under separate escape character rules.

    Decimal 255 is equal to Octal 377 (use Windows Calculator in scientific mode to confirm)

    Hence a three-digit Octal value falls in the range of \000 (0) to \377 (255)

    Therefore, \4715 is not a valid octal value as it is more than three-octal-digits rule. If you want to access the code point character with decimal value 4715, use Unicode escape symbol \u to represent the UTF-16 character \u126B (4715 in decimal form) since every Java char is in Unicode UTF-16.

    from http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html:

    The char data type (and therefore the value that a Character object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value. (Refer to the definition of the U+n notation in the Unicode standard.)

    The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java 2 platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).

    Edited:

    Anything that beyond the valid octal value of 8-bit range (larger than one byte) is language-specific. Some programming languages may carry on to match Unicode implementation; some may not (limit it to one byte). Java definitely does not allow it even though it has Unicode support.

    A few programming languages (vendor-dependent) that limit to one-byte octal literals:

    1. Java (all vendors): - An octal integer constant that begins with 0 or single-digit in base-8 (up to 0377); \0 to \7, \00 to \77, \000 to \377 (in octal string literal format)
    2. C/C++ (Microsoft) - An octal integer constant that begins with 0 (up to 0377); octal string literal format \nnn
    3. Ruby - An octal integer constant that begins with 0 (up to 0377); octal string literal format \nnn

    A few programming languages (vendor-dependent) that support larger-than-one-byte octal literals:

    1. Perl - An octal integer constant that begins with 0; octal string literal format \nnn See http://search.cpan.org/~jesse/perl-5.12.1/pod/perlrebackslash.pod#Octal_escapes

    A few programming languages do not support octal literals:

    1. C# - use Convert.ToInt32(integer, 8) for base-8 How can we convert binary number into its octal number using c#?

提交回复
热议问题