How is SpaCy's similarity computed?

无人久伴 提交于 2019-12-11 01:36:09

问题


Beginner NLP Question here:

How does the .similiarity method work?

Wow spaCy is great! Its tfidf model could be easier to preprocess, but w2v with only one line of code (token.vector)?! - Awesome!

In his 10 line tutorial on spaCy andrazhribernik show's us the .similarity method that can be run on tokens, sents, word chunks, and docs.

After nlp = spacy.load('en') and doc = nlp(raw_text) we can do .similarity queries between tokens and chunks. However, what is being calculated behind the scenes in this .similarity method?

SpaCy already has the incredibly simple .vector, which computes the w2v vector as trained from the GloVe model (how cool would a .tfidf or .fasttext method be?).

Is the model similarity model simply computing the cosine similarity between these two w2v-GloVe-vectors or doing something else? The specifics aren't clear in the documentation; any help appreciated!


回答1:


Assuming that the method you are referring to is the token similarity one, you can find the function in the sourcecode here. As you can see it computes the cosine similarity between the vectors.

As it says in the tutorial:

A word embedding is a representation of a word, and by extension a whole language corpus, in a vector or other form of numerical mapping. This allows words to be treated numerically with word similarity represented as spatial difference in the dimensions of the word embedding mapping.

So the vector distance can be related to the word similarity.




回答2:


Found the answer, in short, it's yes:

Link to Souce Code

return numpy.dot(self.vector, other.vector) / (self.vector_norm * other.vector_norm)

This looks like its the formula for computing cosine similarity and the vectors seem to be created with SpaCy's .vector which documentation says is trained from GloVe's w2v model.



来源:https://stackoverflow.com/questions/46348209/how-is-spacys-similarity-computed

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!