\'\\u00BD\' # ½
\'\\u00B2\' # ²
I am trying to understand isdecimal() and isdigit() better, for this its necessary to understand unicode numeric va
To get the 'numeric value' contained in the character, you could use unicodedata.numeric() function:
>>> import unicodedata
>>> unicodedata.numeric('\u00BD')
0.5
Use the ord() function to get the integer codepoint, optionally in combination with format()
to produce a hexadecimal value:
>>> ord('\u00BD')
189
>>> format(ord('\u00BD'), '04x')
'00bd'
You can get access to the character property with unicodedata.category()
, which you'd then need to check against the documented categories:
>>> unicodedata('\u00DB')
'No'
where 'No' stands for Number, Other.
However, there are a series of .isnumeric() == True
characters in the category Lo
; the Python unicodedata
database only gives you access to the general category and relies on str.isdigit()
, str.isnumeric()
, and unicodedata.digit()
, unicodedata.numeric()
, etc. methods to handle the additional categories.
If you want a precise list of all numeric Unicode characters, the canonical source is the Unicode database; a series of text files that define the whole of the standard. The DerivedNumericTypes.txt file (v. 6.3.0) gives you a 'view' on that database specific the numeric properties; it tells you at the top how the file is derived from other data files in the standard. Ditto for the DerivedNumericValues.txt file, listing the exact numeric value per codepoint.