问题
Arabic and Chinese have their own glyphs for digits.
int
works correctly with all the different ways to write numbers.
I was not able to reproduce the behaviour (python 3.5.0)
>>> from unicodedata import name
>>> name('𐹤')
'RUMI DIGIT FIVE'
>>> int('𐹤')
ValueError: invalid literal for int() with base 10: '𐹤'
>>> int('五') # chinese/japanese number five
ValueError: invalid literal for int() with base 10: '五'
Am I doing something wrong? Or is the claim simply incorrect (source).
回答1:
int
does not accept all ways to write numbers. It understands digit characters used for positional numeral systems, but neither Rumi nor Chinese numerals are positional. Neither '五五'
nor two copies of Rumi numeral 5 would represent 55, so int
doesn't accept them.
回答2:
Here's a way to convert to numerical values (casting to int
does not work in all cases, unless there's a secret setting somewhere)
from unicodedata import numeric
print(numeric('五'))
result: 5.0
Someone noted (and was right) that some arabic or other chars worked fine with int
, so a routine with a fallback mechanism could be done:
from unicodedata import numeric
def to_integer(s):
try:
r = int(s)
except ValueError:
r = int(numeric(s))
return r
EDIT: as zvone noted, there are fraction characters that return floating point numbers: ex: numeric('\u00be') is 0.75
(3/4 char). So rounding to int is not always safe.
EDIT2: the numeric
function only accepts one character. So the "conversion to numeric" that could handle most cases without risks of rounding would be
from unicodedata import numeric
def to_float(s):
try:
r = float(s)
except ValueError:
r = numeric(s)
return r
print(to_float('۵۵'))
print(to_float('五'))
print(to_float('¾'))
result:
55.0
5.0
0.75
(I don't want to steal user2357112 excellent explanation, but still wanted to provide a solution that tries to cover all cases)
回答3:
The source is incorrect.
From python doc:
class int(x, base=10)
Return an integer object constructed from a number or string x, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in radix base.
And an integer literal is just a string of numbers.
Edit: Was wrong, dug into the source code and found this function is called when python wants to convert a string to int. There is a py_CHARMASK which I guess contains the information we need, but I could not find it :/
来源:https://stackoverflow.com/questions/39710365/how-to-convert-unicode-numbers-to-ints