How to reveal Unicodes numeric value property

后端 未结 2 1222
南旧
南旧 2020-12-12 05:54
\'\\u00BD\' # ½
\'\\u00B2\' # ²

I am trying to understand isdecimal() and isdigit() better, for this its necessary to understand unicode numeric va

相关标签:
2条回答
  • 2020-12-12 06:37

    the docs explicitly specify the relation between the methods and Numeric_Type property.

    def is_decimal(c):
        """Whether input character is Numeric_Type=decimal."""
        return c.isdecimal() # it means General Category=Decimal Number in Python
    
    def is_digit(c):
        """Whether input character is Numeric_Type=digit."""
        return c.isdigit() and not c.isdecimal()
    
    
    def is_numeric(c):
        """Whether input character is Numeric_Type=numeric."""
        return c.isnumeric() and not c.isdigit() and not c.isdecimal()
    

    Example:

    >>> for c in '\u00BD\u00B2':
    ...     print("{}: Numeric: {}, Digit: {}, Decimal: {}".format(
    ...         c, is_numeric(c), is_digit(c), is_decimal(c)))
    ... 
    ½: Numeric: True, Digit: False, Decimal: False
    ²: Numeric: False, Digit: True, Decimal: False
    

    I'm not sure Decimal Number and Numeric_Type=Decimal will always be identical.

    Note: '\u00B2' is not decimal because superscripts are explicitly excluded by the standard, see 4.6 Numerical Value (Unicode 6.2).

    0 讨论(0)
  • 2020-12-12 06:40

    To get the 'numeric value' contained in the character, you could use unicodedata.numeric() function:

    >>> import unicodedata
    >>> unicodedata.numeric('\u00BD')
    0.5
    

    Use the ord() function to get the integer codepoint, optionally in combination with format() to produce a hexadecimal value:

    >>> ord('\u00BD')
    189
    >>> format(ord('\u00BD'), '04x')
    '00bd'
    

    You can get access to the character property with unicodedata.category(), which you'd then need to check against the documented categories:

    >>> unicodedata('\u00DB')
    'No'
    

    where 'No' stands for Number, Other.

    However, there are a series of .isnumeric() == True characters in the category Lo; the Python unicodedata database only gives you access to the general category and relies on str.isdigit(), str.isnumeric(), and unicodedata.digit(), unicodedata.numeric(), etc. methods to handle the additional categories.

    If you want a precise list of all numeric Unicode characters, the canonical source is the Unicode database; a series of text files that define the whole of the standard. The DerivedNumericTypes.txt file (v. 6.3.0) gives you a 'view' on that database specific the numeric properties; it tells you at the top how the file is derived from other data files in the standard. Ditto for the DerivedNumericValues.txt file, listing the exact numeric value per codepoint.

    0 讨论(0)
提交回复
热议问题