问题
I wrote a function to extract integer from strings. The strings example is below and it is a column in my dataframe. The output I got is in square bracket, with a lot of numbers inside. I want to use those numbers to compute further, but when I check what it is, instead of integer, it is a Nonetype. Why is that? and how can I convert it into integer so I can find .sum() or .mean() with the output numbers I got? Ideally, I want the extracted integer as another column like with str.extract(regex, inplace=True).
Here is part of my data, which is a column in my dataframe df2017
Bo medium lapis 20 cash pr gr
Porte monnaie dogon vert olive 430 euros carte
Bo noires 2015 fleurs clips moins brillant 30 ...
Necklace No 20 2016 80€ carte Grecs 20h00 salo...
Bo mini rouges 30 carte 13h it
Necklace No 17 2016 100€ cash pr US/NYC crois ...
Chocker No 1 2016 + BO No 32 2016 70€ cash pr …
Here is my code
def extract_int_price():
text=df2017['Items'].astype(str)
text=text.to_string()
amount=[int(x) for x in re.findall('(?<!No\s)(?<!new)(?!2016)(\d{2,4})+€?', text)]
print (amount)
Thank you!
回答1:
Your function returns None
because you forgot the return
statement. Because every function in Python has a return value, a missing return
statement is like returning None
.
回答2:
You want to use either str.findall or str.extractall:
In [11]: REGEX = '(?<!No\s)(?<!new)(?!2016)(\d{2,4})+€?'
In [12]: s = df2017['Items']
In [13]: s.str.findall(REGEX)
Out[13]:
0 [20]
1 [430]
2 [2015, 30]
3 [016, 80, 20, 00]
4 [30, 13]
5 [016, 100]
6 [016, 016, 70]
dtype: object
In [14]: s.str.extractall(REGEX)
Out[14]:
0
match
0 0 20
1 0 430
2 0 2015
1 30
3 0 016
1 80
2 20
3 00
4 0 30
1 13
5 0 016
1 100
6 0 016
1 016
2 70
Generally extractall
is preferred since it keeps you in numpy rather than using a Series of python lists.
回答3:
If your problem is getting the sum of the integers, then you can simply:
sum(int(x) for x in ...)
However, if your problem is with the regex, then you should consider improving your filter mechanism (what should go in). You may also consider filtering manually (though not ideal) word by word (determining which word is irrelevant).
来源:https://stackoverflow.com/questions/52545774/function-to-extract-integer-with-regex-returns-nonetype