Merge items on dataframes with duplicate values

蓝咒 提交于 2019-11-28 13:45:28

You'll need to create surrogate columns with groupby + cumcount to deduplicate your rows, then include those columns when calling merge:

a = df.assign(D=df.groupby('A').cumcount())
b = df_key.assign(D=df_key.groupby('A').cumcount())

a.merge(b, on=['A', 'D'], how='left').drop('D', 1)

     A    B    C
0  foo  1.0  2.0
1  foo  3.0  4.0
2  foo  NaN  NaN
3  foo  NaN  NaN
4  bar  5.0  9.0
5  bar  2.0  4.0
6  bar  1.0  9.0
7  bar  NaN  NaN
WeNYoBen

Or you can just repeat the column A of df_key the remaining number of times from df.

s=df.A.value_counts()-df_key.A.value_counts()

pd.concat([df_key,pd.DataFrame({'A':s.index.repeat(s)})]).sort_values('A')
Out[469]: 
     A    B    C
2  bar  5.0  9.0
3  bar  2.0  4.0
4  bar  1.0  9.0
0  bar  NaN  NaN
0  foo  1.0  2.0
1  foo  3.0  4.0
1  foo  NaN  NaN
2  foo  NaN  NaN
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!