How to find a columns set for a primary key candidate in CSV file?

前端未结

关注

 2  1387

小蘑菇 2021-01-13 18:54

I have a CSV file (not normalized, example, real file up to 100 columns):

   ID, CUST_NAME, CLIENT_NAME, PAYMENT_NUM, START_DATE, END_DATE
    1,     CUST1,


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   梦谈多话
                                             
                
                
                (楼主)
            
              
              
                2021-01-13 19:28
              

            
            
                        
This is one way via itertools.combinations. It works by, for each set of columns, dropping duplicates and checking if the size of the dataframe changes.

This results in 44 distinct combinations of columns.

from itertools import combinations, chain

full_list = chain.from_iterable(combinations(df, i) for i in range(1, len(df.columns)+1))

n = len(df.index)

res = []
for cols in full_list:
    cols = list(cols)
    if len(df[cols].drop_duplicates().index) == n:
        res.append(cols)

print(len(res))  # 44

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复