Creating a wordvector model combining words from other models

问题

I have two different word vector models created using word2vec algorithm . Now issue i am facing is few words from first model is not there in second model . I want to create a third model from two different word vectors models where i can use word vectors from both models without loosing meaning and the context of word vectors.

Can I do this, and if so, how?

回答1:

You could potentially translate the vectors for the words only in one model to the other model's coordinate space, using other shared words to learn a translation-function.

There's a facility to do this in recent gensim versions – see the TranslationMatrix tool. There's a demo Jupyter notebook included in the docs/notebooks directory, viewable online at:

https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/translation_matrix.ipynb

You'd presumably take the larger model (or whichever one is thought to be better, perhaps because it was trained on more data), and translate the smaller number of words its missing into its space. You'd use as many common-reference 'anchor' words as is practical.

来源：https://stackoverflow.com/questions/47507091/creating-a-wordvector-model-combining-words-from-other-models

标签

machine-learning

nlp

word2vec

gensim

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!