Flexible numeric string parsing in Python

╄→尐↘猪︶ㄣ 提交于 2019-12-22 04:48:15

问题


Are there any Python libraries that help parse and validate numeric strings beyond what is supported by the built-in float() function? For example, in addition to simple numbers (1234.56) and scientific notation (3.2e15), I would like to be able to parse formats like:

  • Numbers with commas: 2,147,483,647
  • Named large numbers: 5.5 billion
  • Fractions: 1/4

I did a bit of searching and could not find anything, though I would be surprised if such a library did not already exist.


回答1:


If you want to convert "localized" numbers such as the American "2,147,483,647" form, you can use the atof() function from the locale module. Example:

import locale
locale.setlocale(locale.LC_NUMERIC, 'en_US')
print locale.atof('1,234,456.23')  # Prints 1234456.23

As for fractions, Python now handles them directly (since version 2.6); they can even be built from a string:

from fractions import Fraction
x = Fraction('1/4')
print float(x)  # 0.25

Thus, you can parse a number written in any of the first 3 ways you mention, only with the help of the above two standard modules:

try:
    num = float(num_str)
except ValueError:
    try:
        num = locale.atof(num_str)
    except ValueError:
        try:
            num = float(Fraction(num_str))
        except ValueError:
            raise Exception("Cannot parse '%s'" % num_str)  # Or handle '42 billion' here
# 'num' has the numerical value of 'num_str', here.        



回答2:


It should be pretty straightforward to build one in pyparsing - in fact, one of the tutorial pyparsing projects does some of this (wordsToNum.py on this page) does some of it already. You're talking about things that don't really have standard representations (standard in the sense of ISO 8602, not standard in the sense of "what everybody knows"), so it could easily be that nobody's done just what you're looking for.




回答3:


I haven't heard of one. Do you know of any such library for any other languages? That way you could leverage their documentation and tests.

If you can't find one, write a bunch of testcases, then we can help you fill out the parsing code.

Google must have one, try searching for 5.5billion * 10, but I don't think they have opensourced anything like that. Depending on how you need to use it, you might be able to use Google to do some of the work ;)




回答4:


babel has support for the first case (i18n numbers with commas). Docs: http://babel.edgewall.org/wiki/ApiDocs/babel.numbers.

Supporting simple named numbers should not be too hard to code up yourself, same with fractions.



来源:https://stackoverflow.com/questions/1858117/flexible-numeric-string-parsing-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!