How to get n-gram collocations and association in python nltk?

后端未结
关注
 2  1991
执念已碎 2021-02-04 16:47
In this documentation, there is example using nltk.collocations.BigramAssocMeasures(), BigramCollocationFinder,nltk.collocations.TrigramAssocMeas

      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   不要未来只要你来
                                             
                
                
                (楼主)
            
              
              
                2021-02-04 17:20
              

            
            
                        
If you want to find the grams beyond 2 or 3 grams you can use scikit package and Freqdist function to get the count for these grams. I tried doing this with nltk.collocations, but I dont think we can find out more than 3-grams score into it. So I rather decided to go with count of grams. I hope this can help u a little bit. Thankz

here is the code

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
from nltk.collocations import *
from nltk.probability import FreqDist
import nltk

query = "This document gives a very short introduction to machine learning problems"
vect = CountVectorizer(ngram_range=(1,4))
analyzer = vect.build_analyzer()
listNgramQuery = analyzer(query)
listNgramQuery.reverse()
print "listNgramQuery=", listNgramQuery
NgramQueryWeights = nltk.FreqDist(listNgramQuery)
print "\nNgramQueryWeights=", NgramQueryWeights


This will give output as

listNgramQuery= [u'to machine learning problems', u'introduction to machine learning', u'short introduction to machine', u'very short introduction to', u'gives very short introduction', u'document gives very short', u'this document gives very', u'machine learning problems', u'to machine learning', u'introduction to machine', u'short introduction to', u'very short introduction', u'gives very short', u'document gives very', u'this document gives', u'learning problems', u'machine learning', u'to machine', u'introduction to', u'short introduction', u'very short', u'gives very', u'document gives', u'this document', u'problems', u'learning', u'machine', u'to', u'introduction', u'short', u'very', u'gives', u'document', u'this']

NgramQueryWeights= 

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复