how to match a word in a datacolumn with a list of values and applying ignorecase in pandas in python

问题

I have a df,

Name
Ram is one of the key ram
Kumar is playing cricket
Ravi is playing and ravi is a good player

and a list

my_list=["Ram","ravi"]

and my desired dataframe is,

desired_df,
Name                                        Match    Count 
Ram is one of the key ram                   Ram      1
Kumar is playing cricket                 
Ravi is playing and ravi is a good player   ravi     1

I tried

 extracted = df.str.findall('(' + '|'.join(my_list) + ')', 
 flags=re.IGNORECASE).apply(set)
 but I am getting like,
 Match
 Ram,ram
 Ravi,ravi

but I cannot achieve my desired output, please help.

回答1:

Is this what you are looking for ?

new_l = [i.lower() for i in my_list]
extracted = df['Name'].str.lower().str.findall('(' + '|'.join(new_l) + ')').apply(set)


df['Match'] = extracted.apply(','.join)
df['count'] = extracted.apply(len)

                                          Name     Match  count
0                      Ram is one of the key ram       ram      1
1                       Kumar is playing cricket                0
2  Ravi Ram is playing and ravi is a good player  ram,ravi      2

回答2:

In [187]: pat = '({})'.format('|'.join(my_list))

In [188]: df['Match'] = df['Name'].str.extract(pat, expand=False)

In [190]: df['Count'] = df.Name.str.count(pat)

In [191]: df
Out[191]:
                                                Name Match  Count
0                          Ram is one of the key ram   Ram      1
1                           Kumar is playing cricket   NaN      0
2  Ravi is playing and ravi (ravi ravi) is a good...  ravi      3  # i've intentionally added `(ravi ravi)`

来源：https://stackoverflow.com/questions/47096797/how-to-match-a-word-in-a-datacolumn-with-a-list-of-values-and-applying-ignorecas

标签

python

pandas

dataframe

data-analysis

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!