How to count the number of occurrence of a word in a column

穿精又带淫゛_ 提交于 2021-01-28 23:30:14

问题


I have a column named word_count which contains the count of all the words in a review. How can I find the number of times the word awesome has occurred in each row of that column and use .apply() method to make it into a new column say awesome.

products['word_count'][1]
   {'and': 3L,'bags': 1L,'came': 1L, 'disappointed.':1L,'does':1L,'early':1L,'highly': 1L,'holder.': 1L, 'awesome': 2L}

how can i get the output

products['awesome'][1]
   2


回答1:


What I understood from you is that you have a dictionary called products which holds word counter for various texts like this:

products = {'word_count' : [{'holder.': 2, 'awesome': 1}, {'and': 3,'bags': 1,'came': 1, 'disappointed.':1,'does':1,'early':1,'highly': 1,'holder.': 1, 'awesome': 2}] }

for instance, the first text contains "holder" 2 times and awesome 1 time. To add another column you need to create the array that counts 'awesome' on each text as follows:

counter = []
for i in range(len(products['word_count'])):
    counter.append(products['word_count'][i]['awesome'])

and then add the row to the table:

products['awesome'] = counter

and there you have it!




回答2:


Here's the code for the python function counting_words:

def counting_words(x):
    if (products['word_count'][x].has_key('awesome')):
        return products['word_count'][x]['awesome']
    else:
        return 0

Here's the other part of the code

new_dict = {}
for x in range(len(products)):
    if (x==0):
        new_dict['awesome'] = [counting_words(x)]
    new_dict['awesome'].append(counting_words(x))

newframe = graphlab.SFrame(new_dict)
products.add_columns(newframe)

I assumed that you are using graphlab and the above code will work for the word 'awesome'. The new_dict was created to store the count of 'awesome' in each row of your product['word_count'] column. So in new_dict it should be: new_dict = {'awesome': [0,0,1,...2,1]}. However, if you plan to count other words, this method would be too slow.



来源:https://stackoverflow.com/questions/33068658/how-to-count-the-number-of-occurrence-of-a-word-in-a-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!