Gensim Word2Vec changing the input sentence order?

我怕爱的太早我们不能终老 提交于 2019-12-13 01:13:41

问题


In the gensim's documentation window size is defined as,

window is the maximum distance between the current and predicted word within a sentence.

which should mean when looking at context it doesn't go beyond the sentence boundary. right?

What i did was i created a document with several thousand tweets and selected a word (q1) and then selected most similar words to q1 (using model.most_similar('q1')). But then, if I randomly shuffle the tweets in the input document and then did the same experiment (without changing word2vec parameters) I got a different set most_similar words to q1.

Can't really understand why that happens if only it's gonna look at is sentence level information? can anyone explain this?

EDIT: added model parameters and a graph

used model parameters:

model1 = word2vec.Word2Vec(sents1 , size=100, window=5, min_count=5, iter=n_iter, sg=0)

Graph: To draw the graph what i did was I ran word2vec with above parameters for the original document (D) and the shuffled document (D') and took the top 10 or 20 (two bars) most_similar('q') words to a specific query word q, and calculated the jaccard similarity score between the two sets of words when iter=1,10,100.

It seems as the no of iterations increase, lesser and lesser similar words between the two sets of words got from running word2vec on D and D'.

can't really understand why this is happening or what's going on?

来源:https://stackoverflow.com/questions/36790867/gensim-word2vec-changing-the-input-sentence-order

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!