How does gensim calculate doc2vec paragraph vectors

♀尐吖头ヾ 提交于 2019-12-03 10:40:39

How does concatenation or averaging work?

You got it right for the average. The concatenation is: [0.1,0.2,0.3,0.4,0.5,0.6].

Is the paragraph token equal to the paragraph vector which is equal to on?

The "paragraph token" is mapped to a vector that is called "paragraph vector". It is different from the token "on", and different from the word vector that the token "on" is mapped to.

A simple (and sometimes useful) vector for a range of text is the sum or average of the text's words' vectors – but that's not what the 'Paragraph Vector' of the 'Paragraph Vectors' paper is.

Rather, the Paragraph Vector is another vector, trained similarly to the word vectors, which is also adjusted to help in word-prediction. These vectors are combined (or interleaved) with the word vectors to feed the prediction model. That is, the averaging (in DM mode) includes the PV alongside word-vectors - it doesn't compose the PV from word-vectors.

In the diagram, on is the target-word being predicted, in that diagram by a combination of closely-neighboring words and the full-example's PV, which may perhaps be informally thought of as a special pseudoword, ranging over the entire text example, participating in all the sliding 'windows' of real words.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!