Pandas: merge dataframes without creating new columns

时光毁灭记忆、已成空白 提交于 2021-02-07 14:15:16

问题


I've got 2 dataframes with identical columns:

df1 = pd.DataFrame([['Abe','1','True'],['Ben','2','True'],['Charlie','3','True']], columns=['Name','Number','Other'])
df2 = pd.DataFrame([['Derek','4','False'],['Ben','5','False'],['Erik','6','False']], columns=['Name','Number','Other'])

which give:

     Name Number Other
0      Abe      1  True
1      Ben      2  True
2  Charlie      3  True

and

    Name Number  Other
0  Derek      4  False
1    Ben      5  False
2   Erik      6  False

I want an output dataframe that is an intersection of the two based on "Name":

output_df = 
        Name Number  Other
    0    Ben      2  True
    1    Ben      5  False

I've tried a basic pandas merge but the return is non-desirable:

pd.merge(df1,df2,how='inner',on='Name') = 
 Name Number_x Other_x Number_y Other_y
0  Ben        2    True        5   False

These dataframes are quite large so I'd prefer to use some pandas magic to keep things quick.


回答1:


You can use concat and then filter by isin with numpy.intersect1d using boolean indexing:

val = np.intersect1d(df1.Name, df2.Name)
print (val)
['Ben']

df = pd.concat([df1,df2], ignore_index=True)
print (df[df.Name.isin(val)])
  Name Number  Other
1  Ben      2   True
4  Ben      5  False

Another possible solution for val is intersection of sets:

val = set(df1.Name).intersection(set(df2.Name))
print (val)
{'Ben'}

Then is possible reset index to monotonic:

df = pd.concat([df1,df2])
print (df[df.Name.isin(val)].reset_index(drop=True))
  Name Number  Other
0  Ben      2   True
1  Ben      5  False


来源:https://stackoverflow.com/questions/41262379/pandas-merge-dataframes-without-creating-new-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!