How do I calculate the shortest path (geodesic) distance between two adjectives in WordNet using Python NLTK?

后端未结

关注

 2  1018

无人共我 2020-12-28 11:12

Computing the semantic similarity between two synsets in WordNet can be easily done with several built-in similarity measures, such as:

synset1.path_similari


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   半阙折子戏
                                             
                
                
                (楼主)
            
              
              
                2020-12-28 11:28
              

            
            
                        
There's no easy way to get similarity between words that are not nouns/verbs.

As noted, nouns/verbs similarity are easily extracted from 

>>> from nltk.corpus import wordnet as wn
>>> dog = wn.synset('dog.n.1')
>>> cat = wn.synset('cat.n.1')
>>> car = wn.synset('car.n.1')
>>> wn.path_similarity(dog, cat)
0.2
>>> wn.path_similarity(dog, car)
0.07692307692307693
>>> wn.wup_similarity(dog, cat)
0.8571428571428571
>>> wn.wup_similarity(dog, car)
0.4
>>> wn.lch_similarity(dog, car)
1.072636802264849
>>> wn.lch_similarity(dog, cat)
2.0281482472922856


For adjective it's hard, so you would need to build your own text similarity device. The easiest way is to use vector space model, basically, all words are represented by a number of floating point numbers, e.g.

>>> import numpy as np
>>> blue = np.array([0.2, 0.2, 0.3])
>>> red = np.array([0.1, 0.2, 0.3])
>>> pink = np.array([0.1001, 0.221, 0.321])
>>> car = np.array([0.6, 0.9, 0.5])
>>> def cosine(x,y):
...     return np.dot(x,y) / (np.linalg.norm(x) * np.linalg.norm(y))
... 
>>> cosine(pink, red)
0.99971271929384864
>>> cosine(pink, blue)
0.96756147991512709
>>> cosine(blue, red)
0.97230558532824662
>>> cosine(blue, car)
0.91589118863996888
>>> cosine(red, car)
0.87469454283170045
>>> cosine(pink, car)
0.87482313596223782


To train a bunch of vectors for something like pink = np.array([0.1001, 0.221, 0.321]), you should try google for


Latent semantic indexing / Latent semantic analysis
Bag of Words
Vector space model semantics
Word2Vec, Doc2Vec, Wiki2Vec
Neural Nets
cosine similarity natural language semantics


You can also try some off the shelf software / libraries like:


Gensim https://radimrehurek.com/gensim/
http://webcache.googleusercontent.com/search?q=cache:u5y4He592qgJ:takelab.fer.hr/sts/+&cd=2&hl=en&ct=clnk&gl=sg


Other than vector space model, you can try some graphical model that puts words into a graph and uses something like pagerank to walk around the graph to give you some similarity measure. 

See also: 


Compare similarity of terms/expressions using NLTK?
check if two words are related to each other
How to determine semantic hierarchies / relations in using NLTK?
Is there an algorithm that tells the semantic similarity of two phrases
Semantic Relatedness Algorithms - python

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复