Pandas inner merge/join returning all rows

时光怂恿深爱的人放手 提交于 2019-12-13 18:29:45

问题


I'm trying to merge two data frames based on a column present in both, keeping only the intersection of the two sets.

The desired result is:

 foo           bar            foobar
x y z        x j i           x y z j i
a 1 2        a 9 0           a 1 2 9 0
b 3 4        b 9 0           b 3 4 9 0 
c 5 6        c 9 0           c 5 6 9 0
d 7 8        e 9 0           
             f 9 0 

My code that does not produce the desired result is:

pd.merge(foo, bar, how='inner', on='x')

Instead, the code seems to return:

 foo           bar            foobar
x y z        x j i           x y z j i
a 1 2        a 9 0           a 1 2 9 0
b 3 4        b 9 0           b 3 4 9 0 
c 5 6        c 9 0           c 5 6 9 0
d 7 8        e 9 0           e * * 9 0
             f 9 0           f * * 9 0

(where * represents an NaN)

Where am I going wrong? I've already reached the third Google page trying to fix this an nothing works. Whatever I do I get an outer join, with all rows in both sets.


回答1:


Usually it means that you have duplicates in the column(s) used for joining, resulting in cartesian product.

Demo:

In [35]: foo
Out[35]:
   x  y  z
0  a  1  2
1  b  3  4
2  c  5  6
3  d  7  8

In [36]: bar
Out[36]:
   x  j  i
0  a  9  0
1  b  9  0
2  a  9  0
3  a  9  0
4  b  9  0

In [37]: pd.merge(foo, bar)
Out[37]:
   x  y  z  j  i
0  a  1  2  9  0
1  a  1  2  9  0
2  a  1  2  9  0
3  b  3  4  9  0
4  b  3  4  9  0


来源:https://stackoverflow.com/questions/45471466/pandas-inner-merge-join-returning-all-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!