问题
I've got 2 dataframes with identical columns:
df1 = pd.DataFrame([['Abe','1','True'],['Ben','2','True'],['Charlie','3','True']], columns=['Name','Number','Other'])
df2 = pd.DataFrame([['Derek','4','False'],['Ben','5','False'],['Erik','6','False']], columns=['Name','Number','Other'])
which give:
Name Number Other
0 Abe 1 True
1 Ben 2 True
2 Charlie 3 True
and
Name Number Other
0 Derek 4 False
1 Ben 5 False
2 Erik 6 False
I want an output dataframe that is an intersection of the two based on "Name":
output_df =
Name Number Other
0 Ben 2 True
1 Ben 5 False
I've tried a basic pandas merge but the return is non-desirable:
pd.merge(df1,df2,how='inner',on='Name') =
Name Number_x Other_x Number_y Other_y
0 Ben 2 True 5 False
These dataframes are quite large so I'd prefer to use some pandas magic to keep things quick.
回答1:
You can use concat and then filter by isin with numpy.intersect1d using boolean indexing:
val = np.intersect1d(df1.Name, df2.Name)
print (val)
['Ben']
df = pd.concat([df1,df2], ignore_index=True)
print (df[df.Name.isin(val)])
Name Number Other
1 Ben 2 True
4 Ben 5 False
Another possible solution for val
is intersection
of sets:
val = set(df1.Name).intersection(set(df2.Name))
print (val)
{'Ben'}
Then is possible reset index to monotonic:
df = pd.concat([df1,df2])
print (df[df.Name.isin(val)].reset_index(drop=True))
Name Number Other
0 Ben 2 True
1 Ben 5 False
来源:https://stackoverflow.com/questions/41262379/pandas-merge-dataframes-without-creating-new-columns