Compare 2 excel files using Python

后端 未结 4 791
长发绾君心
长发绾君心 2020-12-05 08:34

I have two xlsx files as follows:

value1   value2   value3
0.456   3.456    0.4325436
6.24654 0.235435 6.376546
4.26545 4.264543 7.2564523


        
相关标签:
4条回答
  • 2020-12-05 09:06

    The following approach should get you started:

    from itertools import izip_longest
    import xlrd
    
    rb1 = xlrd.open_workbook('file1.xlsx')
    rb2 = xlrd.open_workbook('file2.xlsx')
    
    sheet1 = rb1.sheet_by_index(0)
    sheet2 = rb2.sheet_by_index(0)
    
    for rownum in range(max(sheet1.nrows, sheet2.nrows)):
        if rownum < sheet1.nrows:
            row_rb1 = sheet1.row_values(rownum)
            row_rb2 = sheet2.row_values(rownum)
    
            for colnum, (c1, c2) in enumerate(izip_longest(row_rb1, row_rb2)):
                if c1 != c2:
                    print "Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2)
        else:
            print "Row {} missing".format(rownum+1)
    

    This will display any cells which are different between the two files. For your given two files, this will display:

    Row 3 Col 2 - 0.235435 != 0.23546
    

    If you prefer cell names, then use xlrd.formular.colname():

    print "Cell {}{}  {} != {}".format(rownum+1, xlrd.formula.colname(colnum), c1, c2)
    

    Giving you:

    Cell 3B  0.235435 != 0.23546
    
    0 讨论(0)
  • 2020-12-05 09:06
    df_file1 = pd.read_csv("Source_data.csv")
    df_file1 = df_file1.replace(np.nan, '', regex=True) #Replacing Nan with space
    df_file2 = pd.read_csv("Target_data.csv")
    df_file2 = df_file2.replace(np.nan, '', regex=True)
    
    df_i = pd.concat([df_file1, df_file2], axis='columns', keys=['file1', 'file2'])
    
    df_f = df_i.swaplevel(axis='columns')[df_s.columns[0:]]
    
    def highlight_diff(data, color='yellow'):
        attr = 'background-color: {}'.format(color)
        other = data.xs('file1', axis='columns', level=-1)
        return pd.DataFrame(np.where(data.ne(other, level=0), attr, ''), index=data.index, columns=data.columns)
    
    df_final = df_f.style.apply(highlight_diff, axis=None)
    
    writer = pd.ExcelWriter('comparexcels.xlsx')
    df_final.to_excel(writer)
     writer.save()
    

    Below Picture shows the output of file highlighting the differences

    0 讨论(0)
  • 2020-12-05 09:15

    Use pandas and you can do it as simple as this:

    import pandas as pd
    
    df1 = pd.read_excel('excel1.xlsx')
    df2 = pd.read_excel('excel2.xlsx')
    
    difference = df1[df1!=df2]
    print difference
    

    And the result will look like this:

    0 讨论(0)
  • 2020-12-05 09:23

    I use a code for doing something similar. It's a bit generalized works well. Input Excel Sheets and expected Output Dataframe image

    import pandas as pd
    import numpy as np
    from xlsxwriter.utility import xl_rowcol_to_cell
    
    template = pd.read_excel("template.xlsx",na_values=np.nan,header=None)
    testSheet = pd.read_excel("test.xlsx",na_values=np.nan,header=None)
    
    rt,ct = template.shape
    rtest,ctest = testSheet.shape
    
    df = pd.DataFrame(columns=['Cell_Location','BaseTemplate_Value','CurrentFile_Value'])
    
    for rowNo in range(max(rt,rtest)):
      for colNo in range(max(ct,ctest)):
        # Fetching the template value at a cell
        try:
            template_val = template.iloc[rowNo,colNo]
        except:
            template_val = np.nan
    
        # Fetching the testsheet value at a cell
        try:
            testSheet_val = testSheet.iloc[rowNo,colNo]
        except:
            testSheet_val = np.nan
    
        # Comparing the values
        if (str(template_val)!=str(testSheet_val)):
            cell = xl_rowcol_to_cell(rowNo, colNo)
            dfTemp = pd.DataFrame([[cell,template_val,testSheet_val]],
                                  columns=['Cell_Location','BaseTemplate_Value','CurrentFile_Value'])
            df = df.append(dfTemp)
    

    df is the required dataframe

    0 讨论(0)
提交回复
热议问题