I\'m trying to compute a simple word frequency using scikit-learn\'s CountVectorizer
.
import pandas as pd
import numpy as np
from sklearn.featur
We are going to use the zip method to make dict from a list of words and list of their counts
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
texts=["dog cat fish","dog cat cat","fish bird","bird"]
cv = CountVectorizer()
cv_fit=cv.fit_transform(texts)
word_list = cv.get_feature_names();
count_list = cv_fit.toarray().sum(axis=0)
print word_list
['bird', 'cat', 'dog', 'fish']
print count_list
[2 3 2 2]
print dict(zip(word_list,count_list))
{'fish': 2, 'dog': 2, 'bird': 2, 'cat': 3}