How to convert unicode numbers to ints?

[亡魂溺海] 提交于 2020-12-30 07:30:55

问题


Arabic and Chinese have their own glyphs for digits. int works correctly with all the different ways to write numbers.

I was not able to reproduce the behaviour (python 3.5.0)

>>> from unicodedata import name
>>> name('𐹤')
'RUMI DIGIT FIVE'
>>> int('𐹤')
ValueError: invalid literal for int() with base 10: '𐹤'
>>> int('五')  # chinese/japanese number five
ValueError: invalid literal for int() with base 10: '五'

Am I doing something wrong? Or is the claim simply incorrect (source).


回答1:


int does not accept all ways to write numbers. It understands digit characters used for positional numeral systems, but neither Rumi nor Chinese numerals are positional. Neither '五五' nor two copies of Rumi numeral 5 would represent 55, so int doesn't accept them.




回答2:


Here's a way to convert to numerical values (casting to int does not work in all cases, unless there's a secret setting somewhere)

from unicodedata import numeric
print(numeric('五'))

result: 5.0

Someone noted (and was right) that some arabic or other chars worked fine with int, so a routine with a fallback mechanism could be done:

from unicodedata import numeric

def to_integer(s):
    try:
        r = int(s)
    except ValueError:
        r = int(numeric(s))
    return r

EDIT: as zvone noted, there are fraction characters that return floating point numbers: ex: numeric('\u00be') is 0.75 (3/4 char). So rounding to int is not always safe.

EDIT2: the numeric function only accepts one character. So the "conversion to numeric" that could handle most cases without risks of rounding would be

from unicodedata import numeric

def to_float(s):
    try:
        r = float(s)
    except ValueError:
        r = numeric(s)
    return r

print(to_float('۵۵'))
print(to_float('五'))
print(to_float('¾'))

result:

55.0
5.0
0.75

(I don't want to steal user2357112 excellent explanation, but still wanted to provide a solution that tries to cover all cases)




回答3:


The source is incorrect.

From python doc:

class int(x, base=10)

Return an integer object constructed from a number or string x, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in radix base.

And an integer literal is just a string of numbers.

Edit: Was wrong, dug into the source code and found this function is called when python wants to convert a string to int. There is a py_CHARMASK which I guess contains the information we need, but I could not find it :/



来源:https://stackoverflow.com/questions/39710365/how-to-convert-unicode-numbers-to-ints

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!