Unicode subscripts and superscripts in identifiers, why does Python consider XU == Xᵘ == Xᵤ?

前端未结

关注

 2  1287

刺人心 2020-12-31 02:06

Python allows unicode identifiers. I defined Xᵘ = 42, expecting XU and Xᵤ to result in a NameError. But in reality, whe

2条回答

旧巷少年郎 (楼主)

2020-12-31 02:49
Python converts all identifiers to their NFKC normal form; from the Identifiers section of the reference documentation:

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.

The NFKC form of both the super and subscript characters is the lowercase u:
```
>>> import unicodedata
>>> unicodedata.normalize('NFKC', 'Xᵘ Xᵤ')
'Xu Xu'
```
So in the end, all you have is a single identifier, Xu:
```
>>> import dis
>>> dis.dis(compile('Xᵘ = 42\nprint((Xu, Xᵘ, Xᵤ))', '', 'exec'))
  1           0 LOAD_CONST               0 (42)
              2 STORE_NAME               0 (Xu)

  2           4 LOAD_NAME                1 (print)
              6 LOAD_NAME                0 (Xu)
              8 LOAD_NAME                0 (Xu)
             10 LOAD_NAME                0 (Xu)
             12 BUILD_TUPLE              3
             14 CALL_FUNCTION            1
             16 POP_TOP
             18 LOAD_CONST               1 (None)
             20 RETURN_VALUE
```
The above disassembly of the compiled bytecode shows that the identifiers have been normalised during compilation; this happens during parsing, any identifiers are normalised when creating the AST (Abstract Parse Tree) which the compiler uses to produce bytecode.

Identifiers are normalized to avoid many potential 'look-alike' bugs, where you'd otherwise could end up using both ﬁnd() (using the U+FB01 LATIN SMALL LIGATURE FI character followed by the ASCII nd characters) and find() and wonder why your code has a bug.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...