How to get vocabulary word count from gensim word2vec?

问题

I am using gensim word2vec package in python. I know how to get the vocabulary from the trained model. But how to get the word count for each word in vocabulary?

回答1:

Each word in the vocabulary has an associated vocabulary object, which contains an index and a count.

vocab_obj = w2v.vocab["word"]
vocab_obj.count

Output for google news w2v model: 2998437

So to get the count for each word, you would iterate over all words and vocab objects in the vocabulary.

for word, vocab_obj in w2v.vocab.items():
  #Do something with vocab_obj.count

回答2:

When you want to create a dictionary of word to count for easy retrieval later, you can do so as follows:

w2c = dict()
for item in model.wv.vocab:
    w2c[item]=model.wv.vocab[item].count

If you want to sort it to see the most frequent words in the model, you can also do that so:

w2cSorted=dict(sorted(w2c.items(), key=lambda x: x[1],reverse=True))

来源：https://stackoverflow.com/questions/37190989/how-to-get-vocabulary-word-count-from-gensim-word2vec

标签

gensim

word2vec

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!