How to subtract rows of one pandas data frame from another?

后端未结

关注

 4  1172

执笔经年 2020-12-10 11:26

The operation that I want to do is similar to merger. For example, with the inner merger we get a data frame that contains rows that are present in the first AN

4条回答

感情败类 (楼主)

2020-12-10 12:22

You could run into errors if your non-index column has cells with NaN.

print df1

    Team   Year  foo
0   Hawks  2001    5
1   Hawks  2004    4
2    Nets  1987    3
3    Nets  1988    6
4    Nets  2001    8
5    Nets  2000   10
6    Heat  2004    6
7  Pacers  2003   12
8 Problem  2112  NaN


print df2

     Team  Year  foo
0  Pacers  2003   12
1    Heat  2004    6
2    Nets  1988    6
3 Problem  2112  NaN

new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.foo_y.isnull()]

     Team  Year  foo_x  foo_y
0   Hawks  2001      5    NaN
1   Hawks  2004      4    NaN
2    Nets  1987      3    NaN
4    Nets  2001      8    NaN
5    Nets  2000     10    NaN
6 Problem  2112    NaN    NaN

The problem team in 2112 has no value for foo in either table. So, the left join here will falsely return that row, which matches in both DataFrames, as not being present in the right DataFrame.

Solution:

What I do is to add a unique column to the inner DataFrame and set a value for all rows. Then when you join, you can check to see if that column is NaN for the inner table to find unique records in the outer table.

df2['in_df2']='yes'

print df2

     Team  Year  foo  in_df2
0  Pacers  2003   12     yes
1    Heat  2004    6     yes
2    Nets  1988    6     yes
3 Problem  2112  NaN     yes


new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.in_df2.isnull()]

     Team  Year  foo_x  foo_y  in_df1  in_df2
0   Hawks  2001      5    NaN     yes     NaN
1   Hawks  2004      4    NaN     yes     NaN
2    Nets  1987      3    NaN     yes     NaN
4    Nets  2001      8    NaN     yes     NaN
5    Nets  2000     10    NaN     yes     NaN

NB. The problem row is now correctly filtered out, because it has a value for in_df2.

  Problem  2112    NaN    NaN     yes     yes

0 讨论(0)

查看其它4个回答