Why do Java octal escapes only go up to 255?

前端未结

关注

 4  1563

太阳男子 2020-12-20 17:54

The Java language specification states that the escapes inside strings are the \"normal\" C ones like \\n and \\t, but they also specify octal esca

4条回答

别那么骄傲 (楼主)

2020-12-20 18:33
If I can understand the rules (please correct me if I am wrong):
```
\ OctalDigit
Examples:
    \0, \1, \2, \3, \4, \5, \6, \7

\ OctalDigit OctalDigit
Examples:
    \00, \07, \17, \27, \37, \47, \57, \67, \77

\ ZeroToThree OctalDigit OctalDigit
Examples:
    \000, \177, \277, \367,\377
```
\t, \n, \\ do not fall under OctalEscape rules; they must be under separate escape character rules.

Decimal 255 is equal to Octal 377 (use Windows Calculator in scientific mode to confirm)

Hence a three-digit Octal value falls in the range of \000 (0) to \377 (255)

Therefore, \4715 is not a valid octal value as it is more than three-octal-digits rule. If you want to access the code point character with decimal value 4715, use Unicode escape symbol \u to represent the UTF-16 character \u126B (4715 in decimal form) since every Java char is in Unicode UTF-16.

from http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html:

The char data type (and therefore the value that a Character object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value. (Refer to the definition of the U+n notation in the Unicode standard.)

The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java 2 platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).

Edited:

Anything that beyond the valid octal value of 8-bit range (larger than one byte) is language-specific. Some programming languages may carry on to match Unicode implementation; some may not (limit it to one byte). Java definitely does not allow it even though it has Unicode support.

A few programming languages (vendor-dependent) that limit to one-byte octal literals:
1. Java (all vendors): - An octal integer constant that begins with 0 or single-digit in base-8 (up to 0377); \0 to \7, \00 to \77, \000 to \377 (in octal string literal format)
2. C/C++ (Microsoft) - An octal integer constant that begins with 0 (up to 0377); octal string literal format \nnn
3. Ruby - An octal integer constant that begins with 0 (up to 0377); octal string literal format \nnn
A few programming languages (vendor-dependent) that support larger-than-one-byte octal literals:
1. Perl - An octal integer constant that begins with 0; octal string literal format \nnn See http://search.cpan.org/~jesse/perl-5.12.1/pod/perlrebackslash.pod#Octal_escapes
A few programming languages do not support octal literals:
1. C# - use Convert.ToInt32(integer, 8) for base-8 How can we convert binary number into its octal number using c#?
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...