问题
By doc we can use this to read a word2vec model with genism
model = KeyedVectors.load_word2vec_format('word2vec.50d.txt', binary=False)
This is an index-to-word mapping, that is, e.g., model.index2word[2]
, how to derive an inverted mapping (word-to-index) based on this?
回答1:
The mappings from word-to-index are in the KeyedVectors
vocab
property, a dictionary with objects that include an index
property.
For example:
word = "whatever" # for any word in model
i = model.vocab[word].index
model.index2word[i] == word # will be true
回答2:
Even simpler solution would be to enumerate index2word
word2index = {token: token_index for token_index, token in enumerate(w2v.index2word)}
word2index['hi'] == 30308 # True
来源:https://stackoverflow.com/questions/47117569/how-to-get-word2index-from-gensim