Why does JSON encode UTF-16 surrogate pairs instead of Unicode code points directly?

前端 未结 1 663
攒了一身酷
攒了一身酷 2020-12-19 08:34

To escape a code point that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogat

1条回答
  •  离开以前
    2020-12-19 08:40

    One of the key architectural features of JSON is that JSON-encoded objects are valid Javascript literals that can be evaluated using the eval function, for example. Unfortunately, older Javascript implementations only support 16-bit Unicode escape sequences with four hex characters in string literals, so there's no other way than to use UTF-16 surrogates in escape sequences for code points above 0xFFFF in a portable way. (The \u{...} syntax that allows arbitrary code points was only introduced in ECMAScript 6.)

    But as you mentioned, there's no need to use escape sequences if your application supports Unicode JSON text. Simply encode the characters directly in the respective Unicode format.

    0 讨论(0)
提交回复
热议问题