python pandas: how to find rows in one dataframe but not in another?

前端未结

关注

 3  1957

Let\'s say that I have two tables: people_all and people_usa, both with the same structure and therefore the same primary key.

How can I g

相关标签:

3条回答

遥遥无期

2020-12-08 17:50
Here is another similar to SQL Pandas method: .query():
```
people_all.query('ID not in @people_usa.ID')
```
or using NumPy's in1d() method:
```
people_all.[~np.in1d(people_all, people_usa)]
```
NOTE: for those who have experience with SQL it might be worth to read Pandas comparison with SQL
0 讨论(0)
发布评论:

提交评论
- 加载中...

野性不改

2020-12-08 17:51

use isin and negate the boolean mask:

people_usa[~people_usa['ID'].isin(people_all ['ID'])]

Example:

In [364]:
people_all = pd.DataFrame({ 'ID' : np.arange(5)})
people_usa = pd.DataFrame({ 'ID' : [3,4,6,7,100]})
people_usa[~people_usa['ID'].isin(people_all['ID'])]

Out[364]:
    ID
2    6
3    7
4  100

so 3 and 4 are removed from the result, the boolean mask looks like this:

In [366]:
people_usa['ID'].isin(people_all['ID'])

Out[366]:
0     True
1     True
2    False
3    False
4    False
Name: ID, dtype: bool

using ~ inverts the mask

0 讨论(0)

野性不改

2020-12-08 18:00

I would combine (by stacking) the data frames and then perform a .drop_duplicates method. Documentation found here:

http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.drop_duplicates.html

0 讨论(0)
发布评论:

提交评论
- 加载中...