how to match a word in a datacolumn with a list of values and applying ignorecase in pandas in python

╄→尐↘猪︶ㄣ 提交于 2019-12-18 09:40:31

问题


I have a df,

Name
Ram is one of the key ram
Kumar is playing cricket
Ravi is playing and ravi is a good player

and a list

my_list=["Ram","ravi"]

and my desired dataframe is,

desired_df,
Name                                        Match    Count 
Ram is one of the key ram                   Ram      1
Kumar is playing cricket                 
Ravi is playing and ravi is a good player   ravi     1   

I tried

 extracted = df.str.findall('(' + '|'.join(my_list) + ')', 
 flags=re.IGNORECASE).apply(set)
 but I am getting like,
 Match
 Ram,ram
 Ravi,ravi

but I cannot achieve my desired output, please help.


回答1:


Is this what you are looking for ?

new_l = [i.lower() for i in my_list]
extracted = df['Name'].str.lower().str.findall('(' + '|'.join(new_l) + ')').apply(set)


df['Match'] = extracted.apply(','.join)
df['count'] = extracted.apply(len)
                                          Name     Match  count
0                      Ram is one of the key ram       ram      1
1                       Kumar is playing cricket                0
2  Ravi Ram is playing and ravi is a good player  ram,ravi      2



回答2:


In [187]: pat = '({})'.format('|'.join(my_list))

In [188]: df['Match'] = df['Name'].str.extract(pat, expand=False)

In [190]: df['Count'] = df.Name.str.count(pat)

In [191]: df
Out[191]:
                                                Name Match  Count
0                          Ram is one of the key ram   Ram      1
1                           Kumar is playing cricket   NaN      0
2  Ravi is playing and ravi (ravi ravi) is a good...  ravi      3  # i've intentionally added `(ravi ravi)`


来源:https://stackoverflow.com/questions/47096797/how-to-match-a-word-in-a-datacolumn-with-a-list-of-values-and-applying-ignorecas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!