问题
I have a df,
Name Description
Ram Ram is one of the good cricketer
Sri Sri is one of the member
Kumar Kumar is a keeper
and a list, my_list=["one","good","ravi","ball"]
I am trying to get the rows which are having atleast one keyword from my_list.
I tried,
mask=df["Description"].str.contains("|".join(my_list),na=False)
I am getting the output_df,
Name Description
Ram Ram is one of ONe crickete
Sri Sri is one of the member
Ravi Ravi is a player, ravi is playing
Kumar there is a BALL
I also want to add the keywords present in the "Description" and its counts in a separate columns,
My desired output is,
Name Description pre-keys keys count
Ram Ram is one of ONe crickete one,good,ONe one,good 2
Sri Sri is one of the member one one 1
Ravi Ravi is a player, ravi is playing Ravi,ravi ravi 1
Kumar there is a BALL ball ball 1
回答1:
Use str.findall + str.join + str.len:
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')')
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
EDIT:
import re
my_list=["ONE","good"]
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
回答2:
Took a shot at this with str.findall
.
c = df.Description.str.findall('({})'.format('|'.join(my_list)))
df['keys'] = c.apply(','.join) # or c.str.join(',')
df['count'] = c.str.len()
df[df['count'] > 0]
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
来源:https://stackoverflow.com/questions/46926464/retrieving-matching-word-count-on-a-datacolumn-using-pandas-in-python