How do I get word frequency in a corpus using Scikit Learn CountVectorizer?

前端 未结 4 1263
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-23 17:14

I\'m trying to compute a simple word frequency using scikit-learn\'s CountVectorizer.

import pandas as pd
import numpy as np
from sklearn.featur         


        
4条回答
  •  春和景丽
    2020-12-23 18:14

    We are going to use the zip method to make dict from a list of words and list of their counts

    import pandas as pd
    import numpy as np    
    from sklearn.feature_extraction.text import CountVectorizer
    
    texts=["dog cat fish","dog cat cat","fish bird","bird"]    
    
    cv = CountVectorizer()   
    cv_fit=cv.fit_transform(texts)    
    word_list = cv.get_feature_names();    
    count_list = cv_fit.toarray().sum(axis=0)    
    

    print word_list
    ['bird', 'cat', 'dog', 'fish']
    print count_list
    [2 3 2 2]
    print dict(zip(word_list,count_list))
    {'fish': 2, 'dog': 2, 'bird': 2, 'cat': 3}

提交回复
热议问题