How to get both the word embeddings vector and context vector of a given word by using word2vec?

天涯浪子 提交于 2021-02-07 03:52:50

问题


from gensim.models import word2vec

sentences = word2vec.Text8Corpus('TextFile')
model = word2vec.Word2Vec(sentences, size=200, min_count = 2, workers = 4)
print model['king']

Is the output vector the context vector of 'king' or the word embedding vector of 'king'? How can I get both context vector of 'king' and the word embedding vector of 'king'? Thanks!


回答1:


It is the embedding vector for 'king'.

If you use hierarchical softmax, the context vectors are:

model.syn1

and if you use negative sampling they are:

model.syn1neg

The vectors can be accessed by:

model.syn1[model.vocab[word].index]



回答2:


'Context vector' is also a 'word embedding' vector. Word embedding means how vocabulary are mapped to vectors of real numbers.

I assume you meant center word's vector when you said 'word embedding' vector.

In word2vec algorithm, when you train the model, it creates two different vectors for one word (when 'king' is used for center word and when it's used for context words.)

I don't know about how gensim is treating these two vectors, but normally, people average both context and center words, or concatinate two vectors. It might not be the most beautiful way to treat the vectors, but it works very well that way.

So when you call model['king'] on some pre-trained vector, the vector you see is probably the averaged version of two vectors.



来源:https://stackoverflow.com/questions/39406092/how-to-get-both-the-word-embeddings-vector-and-context-vector-of-a-given-word-by

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!