Compare 2 seperate csv files and write difference to a new csv file - Python 2.7

后端未结

关注

 3  1266

南旧 2021-01-26 05:21

I am trying to compare two csv files in python and save the difference to a third csv file in python 2.7.

import csv

f1 = open (\"olddata/file1.csv\")
oldFile1


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   青春惊慌失措
                                             
                
                
                (楼主)
            
              
              
                2021-01-26 06:07
              

            
            
                        
What do you mean by difference? The answer to that gives you two distinct possibilities.

If a row is considered same when all columns are same, then you can get your answer via the following code:

import csv

f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
    oldList1.append(row)

f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
    oldList2.append(row)

f1.close()
f2.close()

print [row for row in oldList1 if row not in oldList2]


However, if two rows are same if a certain key field (i.e. column) is same, then the following code will give you your answer:

import csv

f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
    oldList1.append(row)

f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
    oldList2.append(row)

f1.close()
f2.close()

keyfield = 0 # Change this for choosing the column number

oldList2keys = [row[keyfield] for row in oldList2]
print [row for row in oldList1 if row[keyfield] not in oldList2keys]


Note: The above code might run slow for extremely large files. If instead, you wish to speed up code through hashing, you can use set after converting the oldLists using the following code:

set1 = set(tuple(row) for row in oldList1)
set2 = set(tuple(row) for row in oldList2)


After this, you can use set1.difference(set2)
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复