How to remove a word completely from a Word2Vec model in gensim?

后端 未结 4 686
夕颜
夕颜 2020-12-16 13:09

Given a model, e.g.

from gensim.models.word2vec import Word2Vec


documents = [\"Human machine interface for lab abc computer applications\",
\"A survey of u         


        
4条回答
  •  伪装坚强ぢ
    2020-12-16 13:32

    There is no direct way to do what you are looking for. However, you are not completely lost. The method most_similar is implemented in the class WordEmbeddingsKeyedVectors (check the link). You can take a look at this method and modify it to suit your needs.

    The lines shown below perform the actual logic of computing the similar words, you need to replace the variable limited with vectors corresponding to words of your interest. Then you are done

    limited = self.vectors_norm if restrict_vocab is None else self.vectors_norm[:restrict_vocab]
            dists = dot(limited, mean)
            if not topn:
                return dists
    best = matutils.argsort(dists, topn=topn + len(all_words), reverse=True)
    

    Update:

    limited = self.vectors_norm if restrict_vocab is None else self.vectors_norm[:restrict_vocab]
    

    If you see this line, it means if restrict_vocab is used it restricts top n words in the vocab, it is meaningful only if you have sorted the vocab by frequency. If you are not passing restrict_vocab, self.vectors_norm is what goes into limited

    the method most_similar calls another method init_sims. This initializes the value for [self.vector_norm][4] like shown below

            self.vectors_norm = (self.vectors / sqrt((self.vectors ** 2).sum(-1))[..., newaxis]).astype(REAL)
    

    so, you can pickup the words that you are interested in, prepare their norm and use it in place of limited. This should work

提交回复
热议问题