function to extract integer with regex returns nonetype

十年热恋 提交于 2019-12-13 02:17:41

问题


I wrote a function to extract integer from strings. The strings example is below and it is a column in my dataframe. The output I got is in square bracket, with a lot of numbers inside. I want to use those numbers to compute further, but when I check what it is, instead of integer, it is a Nonetype. Why is that? and how can I convert it into integer so I can find .sum() or .mean() with the output numbers I got? Ideally, I want the extracted integer as another column like with str.extract(regex, inplace=True).

Here is part of my data, which is a column in my dataframe df2017

Bo medium lapis 20 cash pr gr
Porte monnaie dogon vert olive 430 euros carte
Bo noires 2015 fleurs clips moins brillant 30 ...
Necklace No 20 2016 80€ carte Grecs 20h00 salo...
Bo mini rouges 30 carte 13h it
Necklace No 17 2016 100€ cash pr US/NYC crois ...
Chocker No 1 2016 + BO No 32 2016 70€ cash pr …

Here is my code

def extract_int_price():
    text=df2017['Items'].astype(str)
    text=text.to_string()
    amount=[int(x) for x in re.findall('(?<!No\s)(?<!new)(?!2016)(\d{2,4})+€?', text)]
    print (amount)

Thank you!


回答1:


Your function returns None because you forgot the return statement. Because every function in Python has a return value, a missing return statement is like returning None.




回答2:


You want to use either str.findall or str.extractall:

In [11]: REGEX = '(?<!No\s)(?<!new)(?!2016)(\d{2,4})+€?'

In [12]: s = df2017['Items']

In [13]: s.str.findall(REGEX)
Out[13]:
0                 [20]
1                [430]
2           [2015, 30]
3    [016, 80, 20, 00]
4             [30, 13]
5           [016, 100]
6       [016, 016, 70]
dtype: object

In [14]: s.str.extractall(REGEX)
Out[14]:
            0
  match
0 0        20
1 0       430
2 0      2015
  1        30
3 0       016
  1        80
  2        20
  3        00
4 0        30
  1        13
5 0       016
  1       100
6 0       016
  1       016
  2        70

Generally extractall is preferred since it keeps you in numpy rather than using a Series of python lists.




回答3:


If your problem is getting the sum of the integers, then you can simply:

sum(int(x) for x in ...)


However, if your problem is with the regex, then you should consider improving your filter mechanism (what should go in). You may also consider filtering manually (though not ideal) word by word (determining which word is irrelevant).



来源:https://stackoverflow.com/questions/52545774/function-to-extract-integer-with-regex-returns-nonetype

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!