What is internal representation of string in Python 3.x

后端 未结 8 1073
逝去的感伤
逝去的感伤 2020-12-03 13:43

In Python 3.x, a string consists of items of Unicode ordinal. (See the quotation from the language reference below.) What is the internal representation of Unicode string? I

8条回答
  •  Happy的楠姐
    2020-12-03 14:03

    There has been NO CHANGE in Unicode internal representation between Python 2.X and 3.X.

    It's definitely NOT UTF-16. UTF-anything is a byte-oriented EXTERNAL representation.

    Each code unit (character, surrogate, etc) has been assigned a number from range(0, 2 ** 21). This is called its "ordinal".

    Really, the documentation you quoted says it all. Most Python binaries use 16-bit ordinals which restricts you to the Basic Multilingual Plane ("BMP") unless you want to muck about with surrogates (handy if you can't find your hair shirt and your bed of nails is off being de-rusted). For working with the full Unicode repertoire, you'd prefer a "wide build" (32 bits wide).

    Briefly, the internal representation in a unicode object is an array of 16-bit unsigned integers, or an array of 32-bit unsigned integers (using only 21 bits).

提交回复
热议问题