How to get word2index from gensim

五迷三道 提交于 2021-01-21 03:47:20

问题


By doc we can use this to read a word2vec model with genism

model = KeyedVectors.load_word2vec_format('word2vec.50d.txt', binary=False)

This is an index-to-word mapping, that is, e.g., model.index2word[2], how to derive an inverted mapping (word-to-index) based on this?


回答1:


The mappings from word-to-index are in the KeyedVectors vocab property, a dictionary with objects that include an index property.

For example:

word = "whatever"  # for any word in model
i = model.vocab[word].index
model.index2word[i] == word  # will be true



回答2:


Even simpler solution would be to enumerate index2word

word2index = {token: token_index for token_index, token in enumerate(w2v.index2word)} 
word2index['hi'] == 30308  # True


来源:https://stackoverflow.com/questions/47117569/how-to-get-word2index-from-gensim

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!