问题
I need to create an excel sheet comparing two sample sheets one contains the serial number and other information. Second sheet contains the warranty date. For example, source1 sheet contains data as below
Model Serial Location
Dell 1234 A
Thoshiba 2345 B
Apple 3456 C
Cisco 4567 D
Sun 5678 E
source2 contains data as below
Serial Warranty Status
2345 1/1/2010
4567 2/2/2012
1112 3/2/2015
and the result should be
Model Serial Location Warranty Status
Dell 1234 A Not Found
Thoshiba 2345 B 1/1/2010
Apple 3456 C Not Found
Cisco 4567 D 2/2/2012
Sun 5678 E Not Found
Not Found 1112 Not Found 3/2/2015
I have found some sample scripts but my scenario contains:
- Large no of data, it takes so much time to run
- Serial number doesn't comes in the same order in both source1 and source2 files
- Cases are there in which serial number doeskin exist in either of the source file
Please give me some suggestions and best algorithm to do this faster.
回答1:
try the below code, which I modified :
import pandas as pd
source1_df = pd.read_excel('a.xlsx', sheetname='source1')
source2_df = pd.read_excel('a.xlsx', sheetname='source2')
joined_df = pd.merge(source1_df,source2_df,on='Serial',how='outer')
joined_df.to_excel('/home/user1/test/result.xlsx')
I'm not an expert in python but above one worked.
回答2:
Install pandas, then you can load each sheet as a dataframe and join by the Serial
:
import pandas as pd
source1_df = pd.read_excel('path/to/excel', sheetname='source1_sheet_name')
source2_df = pd.read_excel('path/to/excel', sheetname='source2_sheet_name')
joined_df = source1_df.join(source2_df, on='Serial')
joined_df.to_excel('path/to/output_excel')
来源:https://stackoverflow.com/questions/38301076/python-compare-two-excel-sheet-and-append-correct-record