merge two dataframes without repeats pandas

你离开我真会死。 提交于 2019-11-29 17:21:07

There is problem you have duplicates in customerId column.

So solution is remove them, e.g. by drop_duplicates:

df2 = df2.drop_duplicates('customerId')

Sample:

df = pd.DataFrame({'customerId':[1,2,1,1,2], 'full name':list('abcde')})
print (df)
   customerId full name
0           1         a
1           2         b
2           1         c
3           1         d
4           2         e

df2 = pd.DataFrame({'customerId':[1,2,1,2,1,1], 'full name':list('ABCDEF')})
print (df2)
   customerId full name
0           1         A
1           2         B
2           1         C
3           2         D
4           1         E
5           1         F

merge = pd.merge(df, df2, on='customerId', how='left')
print (merge)
    customerId full name_x full name_y
0            1           a           A
1            1           a           C
2            1           a           E
3            1           a           F
4            2           b           B
5            2           b           D
6            1           c           A
7            1           c           C
8            1           c           E
9            1           c           F
10           1           d           A
11           1           d           C
12           1           d           E
13           1           d           F
14           2           e           B
15           2           e           D

df2 = df2.drop_duplicates('customerId')
merge = pd.merge(df, df2, on='customerId', how='left')
print (merge)
   customerId full name_x full name_y
0           1           a           A
1           2           b           B
2           1           c           A
3           1           d           A
4           2           e           B

I do not see repeats as a whole row but there are repetetions in customerId. You could remove them using:

    df.drop_duplicates('customerId', inplace = 1) 

where df could be the dataframe corresponding to amount or one obtained post merge. In case you want fewer rows (say n), you could use:

    df.groupby('customerId).head(n)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!