How to reveal Unicodes numeric value property

后端 未结 2 1223
南旧
南旧 2020-12-12 05:54
\'\\u00BD\' # ½
\'\\u00B2\' # ²

I am trying to understand isdecimal() and isdigit() better, for this its necessary to understand unicode numeric va

2条回答
  •  既然无缘
    2020-12-12 06:40

    To get the 'numeric value' contained in the character, you could use unicodedata.numeric() function:

    >>> import unicodedata
    >>> unicodedata.numeric('\u00BD')
    0.5
    

    Use the ord() function to get the integer codepoint, optionally in combination with format() to produce a hexadecimal value:

    >>> ord('\u00BD')
    189
    >>> format(ord('\u00BD'), '04x')
    '00bd'
    

    You can get access to the character property with unicodedata.category(), which you'd then need to check against the documented categories:

    >>> unicodedata('\u00DB')
    'No'
    

    where 'No' stands for Number, Other.

    However, there are a series of .isnumeric() == True characters in the category Lo; the Python unicodedata database only gives you access to the general category and relies on str.isdigit(), str.isnumeric(), and unicodedata.digit(), unicodedata.numeric(), etc. methods to handle the additional categories.

    If you want a precise list of all numeric Unicode characters, the canonical source is the Unicode database; a series of text files that define the whole of the standard. The DerivedNumericTypes.txt file (v. 6.3.0) gives you a 'view' on that database specific the numeric properties; it tells you at the top how the file is derived from other data files in the standard. Ditto for the DerivedNumericValues.txt file, listing the exact numeric value per codepoint.

提交回复
热议问题