How to merge similar items in a list

后端未结

关注

 6  631

花落未央 2021-01-19 02:50

I haven\'t found anything relevant on Google, so I\'m hoping to find some help here :)

I\'ve got a Python list as follows:

[[\'hoose\', 200], [\"Ba


      
      
        
          6条回答        

        
                    
            
            
                         
                
              
              
                
                   既然无缘
                                             
                
                
                (楼主)
            
              
              
                2021-01-19 02:54
              

            
            
                        
In common with the other comments, I'm not sure that doing this makes much sense, but here's a solution that does what you want, I think.  It's very inefficient - O(n²) where n is the number of words in your list - but I'm not sure there's a better way of doing it:

data = [['hoose', 200],
        ["Bananphone", 10],
        ['House', 200],
        ["Bonerphone", 10],
        ['UniqueValue', 777]]

already_merged = []

for word, score in data:
    added_to_existing = False
    for merged in already_merged:
        for potentially_similar in merged[0]:
            if levenshtein(word, potentially_similar) < 5:
                merged[0].add(word)
                merged[1] += score
                added_to_existing = True
                break
        if added_to_existing:
            break
    if not added_to_existing:
        already_merged.append([set([word]),score])

print already_merged


The output is:

[[set(['House', 'hoose']), 400], [set(['Bonerphone', 'Bananphone']), 20], [set(['UniqueValue']), 777]]


One of the obvious problems with this approach is that the word that you're considering might be close enough to many of the different sets of words that you've already considered, but this code will just lump it into the first one it finds.  I've voted +1 for Space_C0wb0y's answer ;)
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它6个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复