Merge dataframes without duplicating rows in python pandas [duplicate]

旧城冷巷雨未停 提交于 2020-12-25 00:11:13

问题


I'd like to combine two dataframes using their similar column 'A':

>>> df1
    A   B
0   I   1
1   I   2
2   II  3

>>> df2
    A   C
0   I   4
1   II  5
2   III 6

To do so I tried using:

merged = pd.merge(df1, df2, on='A', how='outer')

Which returned:

>>> merged
    A   B   C
0   I   1.0 4
1   I   2.0 4
2   II  3.0 5
3   III NaN 6

However, since df2 only contained one value for A == 'I', I do not want this value to be duplicated in the merged dataframe. Instead I would like the following output:

>>> merged
    A   B   C
0   I   1.0 4
1   I   2.0 NaN
2   II  3.0 5
3   III NaN 6

What is the best way to do this? I am new to python and still slightly confused with all the join/merge/concatenate/append operations.


回答1:


Let us create a new variable g, by cumcount

df1['g']=df1.groupby('A').cumcount()
df2['g']=df2.groupby('A').cumcount()
df1.merge(df2,how='outer').drop('g',1)
Out[62]: 
     A    B    C
0    I  1.0  4.0
1    I  2.0  NaN
2   II  3.0  5.0
3  III  NaN  6.0


来源:https://stackoverflow.com/questions/47439234/merge-dataframes-without-duplicating-rows-in-python-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!