How to determine if two partitions (clusterings) of data points are identical?

后端未结
关注
 3  1769
误落风尘 2020-12-11 19:27
I have n data points in some arbitrary space and I cluster them.
The result of my clustering algorithm is a partition represented by an int vector l

      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   轻奢々
                                             
                
                
                (楼主)
            
              
              
                2020-12-11 20:05
              

            
            
                        
If you are going to relabel your partitions, as has been previously suggested, you will potentially need to search through n labels for each of the n items. I.e. the solutions are O(n^2).

Here is my idea: Scan through both lists simultaneously, maintaining a counter for each partition label in each list.
You will need to be able to map partition labels to counter numbers.
If the counters for each list do not match, then the partitions do not match.
This would be O(n).

Here is a proof of concept in Python:

l_1 = [ 1, 1, 1, 0, 0, 2, 6 ]

l_2 = [ 2, 2, 2, 9, 9, 3, 1 ]

l_3 = [ 2, 2, 2, 9, 9, 3, 3 ]

d1 = dict()
d2 = dict()
c1 = []
c2 = []

# assume lists same length

match = True
for i in range(len(l_1)):
    if l_1[i] not in d1:
        x1 = len(c1)
        d1[l_1[i]] = x1
        c1.append(1)
    else:
        x1 = d1[l_1[i]]
        c1[x1] += 1

    if l_2[i] not in d2:
        x2 = len(c2)
        d2[l_2[i]] = x2
        c2.append(1)
    else:
        x2 = d2[l_2[i]]
        c2[x2] += 1

    if x1 != x2 or  c1[x1] != c2[x2]:
        match = False

print "match = {}".format(match)

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复