How to calculate the sentence similarity using word2vec model of gensim with python

后端 未结 14 1347
一向
一向 2020-11-28 00:31

According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words.

e.g.

trained_model.simi         


        
14条回答
  •  天命终不由人
    2020-11-28 01:03

    I would like to update the existing solution to help the people who are going to calculate the semantic similarity of sentences.

    Step 1:

    Load the suitable model using gensim and calculate the word vectors for words in the sentence and store them as a word list

    Step 2 : Computing the sentence vector

    The calculation of semantic similarity between sentences was difficult before but recently a paper named "A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS" was proposed which suggests a simple approach by computing the weighted average of word vectors in the sentence and then remove the projections of the average vectors on their first principal component.Here the weight of a word w is a/(a + p(w)) with a being a parameter and p(w) the (estimated) word frequency called smooth inverse frequency.this method performing significantly better.

    A simple code to calculate the sentence vector using SIF(smooth inverse frequency) the method proposed in the paper has been given here

    Step 3: using sklearn cosine_similarity load two vectors for the sentences and compute the similarity.

    This is the most simple and efficient method to compute the sentence similarity.

提交回复
热议问题