问题
I have a df,
Name
Ram is one of the key ram
Kumar is playing cricket
Ravi is playing and ravi is a good player
and a list
my_list=["Ram","ravi"]
and my desired dataframe is,
desired_df,
Name Match Count
Ram is one of the key ram Ram 1
Kumar is playing cricket
Ravi is playing and ravi is a good player ravi 1
I tried
extracted = df.str.findall('(' + '|'.join(my_list) + ')',
flags=re.IGNORECASE).apply(set)
but I am getting like,
Match
Ram,ram
Ravi,ravi
but I cannot achieve my desired output, please help.
回答1:
Is this what you are looking for ?
new_l = [i.lower() for i in my_list]
extracted = df['Name'].str.lower().str.findall('(' + '|'.join(new_l) + ')').apply(set)
df['Match'] = extracted.apply(','.join)
df['count'] = extracted.apply(len)
Name Match count 0 Ram is one of the key ram ram 1 1 Kumar is playing cricket 0 2 Ravi Ram is playing and ravi is a good player ram,ravi 2
回答2:
In [187]: pat = '({})'.format('|'.join(my_list))
In [188]: df['Match'] = df['Name'].str.extract(pat, expand=False)
In [190]: df['Count'] = df.Name.str.count(pat)
In [191]: df
Out[191]:
Name Match Count
0 Ram is one of the key ram Ram 1
1 Kumar is playing cricket NaN 0
2 Ravi is playing and ravi (ravi ravi) is a good... ravi 3 # i've intentionally added `(ravi ravi)`
来源:https://stackoverflow.com/questions/47096797/how-to-match-a-word-in-a-datacolumn-with-a-list-of-values-and-applying-ignorecas