Update gensim word2vec model

只愿长相守 提交于 2019-11-27 13:07:29
  1. train() expects a sequence of sentences on input, not one sentence.

  2. train() only updates weights for existing feature vectors based on existing vocabulary. You cannot add new vocabulary (=new feature vectors) using train().

As of gensim 0.13.3 it's possible to do online training of Word2Vec with gensim.

model.build_vocab(new_sentences, update=True)
model.train(new_sentences)
fjxx

If your model was generated using the C tool load_word2vec_format it is not possible to update that model. See the word2vec tutorial section on Online Training Word2Vec Tutorial:

Note that it’s not possible to resume training with models generated by the C tool, load_word2vec_format(). You can still use them for querying/similarity, but information vital for training (the vocab tree) is missing there.

First of all, you cannot add new words to a pre-trained model's.

However, there's a "new" doc2vec model published in 2014 which meets all your requirement. You can use it to train a document vector instead of getting a set of word vector then combine them. The best part is doc2vec can infer unseen sentences after training. Although the model is still unchangable, you can get a pretty good inference result based on my experiment.

Problem is that you can not retrain word2vec model with new Sentences. Only doc2vec allows it. Try doc2vec model.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!