I would extract all the numbers contained in a string. Which is the better suited for the purpose, regular expressions or the isdigit()
method?
Example:
Since none of these dealt with real world financial numbers in excel and word docs that I needed to find, here is my variation. It handles ints, floats, negative numbers, currency numbers (because it doesn't reply on split), and has the option to drop the decimal part and just return ints, or return everything.
It also handles Indian Laks number system where commas appear irregularly, not every 3 numbers apart.
It does not handle scientific notation or negative numbers put inside parentheses in budgets -- will appear positive.
It also does not extract dates. There are better ways for finding dates in strings.
import re
def find_numbers(string, ints=True):
numexp = re.compile(r'[-]?\d[\d,]*[\.]?[\d{2}]*') #optional - in front
numbers = numexp.findall(string)
numbers = [x.replace(',','') for x in numbers]
if ints is True:
return [int(x.replace(',','').split('.')[0]) for x in numbers]
else:
return numbers