How to calculate the sentence similarity using word2vec model of gensim with python

后端 未结 14 1346
一向
一向 2020-11-28 00:31

According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words.

e.g.

trained_model.simi         


        
14条回答
  •  孤街浪徒
    2020-11-28 01:13

    If you are using word2vec, you need to calculate the average vector for all words in every sentence/document and use cosine similarity between vectors:

    import numpy as np
    from scipy import spatial
    
    index2word_set = set(model.wv.index2word)
    
    def avg_feature_vector(sentence, model, num_features, index2word_set):
        words = sentence.split()
        feature_vec = np.zeros((num_features, ), dtype='float32')
        n_words = 0
        for word in words:
            if word in index2word_set:
                n_words += 1
                feature_vec = np.add(feature_vec, model[word])
        if (n_words > 0):
            feature_vec = np.divide(feature_vec, n_words)
        return feature_vec
    

    Calculate similarity:

    s1_afv = avg_feature_vector('this is a sentence', model=model, num_features=300, index2word_set=index2word_set)
    s2_afv = avg_feature_vector('this is also sentence', model=model, num_features=300, index2word_set=index2word_set)
    sim = 1 - spatial.distance.cosine(s1_afv, s2_afv)
    print(sim)
    
    > 0.915479828613
    

提交回复
热议问题