Conditional word frequency count in Pandas

北战南征 提交于 2020-02-03 12:15:23

问题


I have a dataframe like below:

data = {'speaker':['Adam','Ben','Clair'],
        'speech': ['Thank you very much and good afternoon.',
                   'Let me clarify that because I want to make sure we have got everything right',
                   'By now you should have some good rest']}
df = pd.DataFrame(data)

I want to count the number of words in the speech column but only for the words from a pre-defined list. For example, the list is:

wordlist = ['much', 'good','right']

I want to generate a new column which shows the frequency of these three words in each row. My expected output is therefore:

     speaker                   speech                               words
0   Adam          Thank you very much and good afternoon.             2
1   Ben        Let me clarify that because I want to make sur...      1
2   Clair        By now you should have received a copy of our ...    1

I tried:

df['total'] = 0
for word in df['speech'].str.split():
    if word in wordlist: 
        df['total'] += 1

But I after running it, the total column is always zero. I am wondering what's wrong with my code?


回答1:


You could use the following vectorised approach:

data = {'speaker':['Adam','Ben','Clair'],
        'speech': ['Thank you very much and good afternoon.',
                   'Let me clarify that because I want to make sure we have got everything right',
                   'By now you should have some good rest']}
df = pd.DataFrame(data)

wordlist = ['much', 'good','right']

df['total'] = df['speech'].str.count(r'\b|\b'.join(wordlist))

Which gives:

>>> df
  speaker                                             speech  total
0    Adam            Thank you very much and good afternoon.      2
1     Ben  Let me clarify that because I want to make sur...      1
2   Clair              By now you should have some good rest      1



回答2:


import pandas as pd

data = {'speaker': ['Adam', 'Ben', 'Clair'],
        'speech': ['Thank you very much and good afternoon.',
                   'Let me clarify that because I want to make sure we have got everything right',
                   'By now you should have some good rest']}
df = pd.DataFrame(data)

wordlist = ['much', 'good', 'right']

df["speech"] = df["speech"].str.split()
df = df.explode("speech")
counts = df[df.speech.isin(wordlist)].groupby("speaker").size()
print(counts)


来源:https://stackoverflow.com/questions/59989449/conditional-word-frequency-count-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!