python pandas: how to find rows in one dataframe but not in another?

前端 未结 3 1954
醉话见心
醉话见心 2020-12-08 17:47

Let\'s say that I have two tables: people_all and people_usa, both with the same structure and therefore the same primary key.

How can I g

相关标签:
3条回答
  • 2020-12-08 17:50

    Here is another similar to SQL Pandas method: .query():

    people_all.query('ID not in @people_usa.ID')
    

    or using NumPy's in1d() method:

    people_all.[~np.in1d(people_all, people_usa)]
    

    NOTE: for those who have experience with SQL it might be worth to read Pandas comparison with SQL

    0 讨论(0)
  • 2020-12-08 17:51

    use isin and negate the boolean mask:

    people_usa[~people_usa['ID'].isin(people_all ['ID'])]
    

    Example:

    In [364]:
    people_all = pd.DataFrame({ 'ID' : np.arange(5)})
    people_usa = pd.DataFrame({ 'ID' : [3,4,6,7,100]})
    people_usa[~people_usa['ID'].isin(people_all['ID'])]
    
    Out[364]:
        ID
    2    6
    3    7
    4  100
    

    so 3 and 4 are removed from the result, the boolean mask looks like this:

    In [366]:
    people_usa['ID'].isin(people_all['ID'])
    
    Out[366]:
    0     True
    1     True
    2    False
    3    False
    4    False
    Name: ID, dtype: bool
    

    using ~ inverts the mask

    0 讨论(0)
  • 2020-12-08 18:00

    I would combine (by stacking) the data frames and then perform a .drop_duplicates method. Documentation found here:

    http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.drop_duplicates.html

    0 讨论(0)
提交回复
热议问题