问题
I have a dataframe like below:
data = {'speaker':['Adam','Ben','Clair'],
'speech': ['Thank you very much and good afternoon.',
'Let me clarify that because I want to make sure we have got everything right',
'By now you should have some good rest']}
df = pd.DataFrame(data)
I want to count the number of words in the speech column but only for the words from a pre-defined list. For example, the list is:
wordlist = ['much', 'good','right']
I want to generate a new column which shows the frequency of these three words in each row. My expected output is therefore:
speaker speech words
0 Adam Thank you very much and good afternoon. 2
1 Ben Let me clarify that because I want to make sur... 1
2 Clair By now you should have received a copy of our ... 1
I tried:
df['total'] = 0
for word in df['speech'].str.split():
if word in wordlist:
df['total'] += 1
But I after running it, the total
column is always zero. I am wondering what's wrong with my code?
回答1:
You could use the following vectorised approach:
data = {'speaker':['Adam','Ben','Clair'],
'speech': ['Thank you very much and good afternoon.',
'Let me clarify that because I want to make sure we have got everything right',
'By now you should have some good rest']}
df = pd.DataFrame(data)
wordlist = ['much', 'good','right']
df['total'] = df['speech'].str.count(r'\b|\b'.join(wordlist))
Which gives:
>>> df
speaker speech total
0 Adam Thank you very much and good afternoon. 2
1 Ben Let me clarify that because I want to make sur... 1
2 Clair By now you should have some good rest 1
回答2:
import pandas as pd
data = {'speaker': ['Adam', 'Ben', 'Clair'],
'speech': ['Thank you very much and good afternoon.',
'Let me clarify that because I want to make sure we have got everything right',
'By now you should have some good rest']}
df = pd.DataFrame(data)
wordlist = ['much', 'good', 'right']
df["speech"] = df["speech"].str.split()
df = df.explode("speech")
counts = df[df.speech.isin(wordlist)].groupby("speaker").size()
print(counts)
来源:https://stackoverflow.com/questions/59989449/conditional-word-frequency-count-in-pandas