How to calculate the sentence similarity using word2vec model of gensim with python

后端未结

关注

 14  1347

一向 2020-11-28 00:31

According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words.

e.g.

trained_model.simi


      
      
        
          14条回答        

        
                    
            
            
                         
                
              
              
                
                   天命终不由人
                                             
                
                
                (楼主)
            
              
              
                2020-11-28 01:03
              

            
            
                        
I would like to update the existing solution to help the people who are going to calculate the semantic similarity of sentences.

Step 1:

Load the suitable model using gensim and calculate the word vectors for words in the sentence and store them as a word list

Step 2 :
Computing the sentence vector

The calculation of semantic similarity between sentences was difficult before but recently a paper named "A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE
EMBEDDINGS" was proposed which suggests a simple approach by computing the weighted average of word vectors in the sentence and then remove the projections of the average vectors on their first principal component.Here the weight of a word w is a/(a + p(w)) with a being a parameter and p(w) the (estimated) word frequency called smooth inverse frequency.this method performing significantly better.

A simple code to calculate the sentence vector using SIF(smooth inverse frequency) the method proposed in the paper has been given here

Step 3:
using sklearn cosine_similarity load two vectors for the sentences and compute the similarity.

This is the most simple and efficient method to compute the sentence similarity.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它14个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复