Finding the distance between 'Doctag' and 'infer_vector' with Gensim Doc2Vec?

送分小仙女□ 提交于 2021-01-28 11:48:51

问题


Using Gensim's Doc2Vec how would I find the distance between a Doctag and an infer_vector()?

Many thanks


回答1:


Doctag is the internal name for the keys to doc-vectors. The result of an infer_vector() operation is a vector. So as you've literally asked, these aren't comparable.

You could ask a model for a known doc-vector, by its doc-tag key that was supplied during training, via model.docvecs[doctag]. That would be comparable to the result of an infer_vector() call.

With two vectors in hand, you can use scipy routines to calculate various kinds of distance. For example:

import scipy.spatial.distance.cosine as cosine_distance
vec_by_doctag = model.docvecs["doc0007"]
vec_by_inference = model.infer_vector(['a', 'cat', 'was', 'in', 'a', 'hat'])
dist = cosine_distance(vec_by_doctag, vec_by_inference)

You can also look at how gensim's Doc2VecKeyedVectors does similarity/distance between vectors that are known (by their doctag key names) inside a model, in its similarity() and distance() functions, at:

https://github.com/RaRe-Technologies/gensim/blob/ca0dcaa1eca8b1764f6456adac5719309e0d8e6d/gensim/models/keyedvectors.py#L1701

https://github.com/RaRe-Technologies/gensim/blob/ca0dcaa1eca8b1764f6456adac5719309e0d8e6d/gensim/models/keyedvectors.py#L1743



来源:https://stackoverflow.com/questions/52488877/finding-the-distance-between-doctag-and-infer-vector-with-gensim-doc2vec

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!