问题
I have two excel file, A and B. A is Master copy where updated record of employee Name and Organization Name (Name and Org) is available. File B contains Name and Org columns with bit older record and many other columns which we are not interested in.
Name Org
0 abc ddc systems
1 sdc ddc systems
2 csc ddd systems
3 rdc kbf org
4 rfc kbf org
I want to do two operation on this:
1) I want to compare Excel B (column Name and Org) with Excel A (column Name and Org) and update file B with all the missing entries of Name and corresponding Org.
2) For all existing entries in File B (column Name and Org), I would like to compare file and with file A and update Org column if any employee organization has changed.
For Solution 1) to find the new entries tried below approach (Not sure if this approach is correct though), output is tuple which I was not sure how to update back to DataFrame.
diff = set(zip(new_df.Name, new_df.Org)) - set(zip(old_df.Name, old_df.Org))
Any help will be appreciated. Thanks.
回答1:
If names are unique, just concatenate A and B, and drop duplicates. Assuming A and B are your DataFrames,
df = pd.concat([A, B]).drop_duplicates(subset=['Name'], keep='first')
Or,
A = A.set_index('Name')
B = B.set_index('Name')
idx = B.index.difference(A.index)
df = pd.concat([A, B.loc[idx]]).reset_index()
Both should be approximately the same in terms of performance.
回答2:
Solution:
diff=pd.DataFrame(list(set(zip(df['aa'], df['bb'])) - set(zip(df2['aa'], df2['bb']))),columns=df.columns)
print(diff.sort_values(by='aa').reset_index(drop=True))
Example:
import pandas as pd
aa = ['aa1', 'aa2', 'aa3', 'aa4', 'aa5']
bb = ['bb1', 'bb2', 'bb3', 'bb4','bb5']
nest = [aa, bb]
df = pd.DataFrame(nest, ['aa', 'bb']).T
df2 = pd.DataFrame(nest, ['aa', 'bb']).T
df2['aa']=df2['aa'].shift(2)
diff=pd.DataFrame(list(set(zip(df['aa'], df['bb'])) - set(zip(df2['aa'], df2['bb']))),columns=df.columns)
print(diff.sort_values(by='aa').reset_index(drop=True))
Output:
aa bb
0 aa1 bb1
1 aa2 bb2
2 aa3 bb3
3 aa4 bb4
4 aa5 bb5
来源:https://stackoverflow.com/questions/51852514/comparing-two-excel-file-with-pandas