Pandas Left Outer Join results in table larger than left table

后端 未结 3 1258
长情又很酷
长情又很酷 2020-11-29 04:32

From what I understand about a left outer join, the resulting table should never have more rows than the left table...Please let me know if this is wrong...

My left

3条回答
  •  一向
    一向 (楼主)
    2020-11-29 05:02

    You can expect this to increase if keys match more than one row in the other DataFrame:

    In [11]: df = pd.DataFrame([[1, 3], [2, 4]], columns=['A', 'B'])
    
    In [12]: df2 = pd.DataFrame([[1, 5], [1, 6]], columns=['A', 'C'])
    
    In [13]: df.merge(df2, how='left')  # merges on columns A
    Out[13]: 
       A  B   C
    0  1  3   5
    1  1  3   6
    2  2  4 NaN
    

    To avoid this behaviour drop the duplicates in df2:

    In [21]: df2.drop_duplicates(subset=['A'])  # you can use take_last=True
    Out[21]: 
       A  C
    0  1  5
    
    In [22]: df.merge(df2.drop_duplicates(subset=['A']), how='left')
    Out[22]: 
       A  B   C
    0  1  3   5
    1  2  4 NaN
    

提交回复
热议问题