Selecting Unique Rows between Two DataFrames in Pandas

前端未结

关注

 1  1685

I have two data frames A and B of unequal dimensions. I would like to create a data frame C such that it ONLY contains rows that are unique between A and B. I tried to follo

相关标签:

1条回答

生来不讨喜

2020-12-11 07:35
This worked for me:
```
In [7]:

df1[~df1.Star_ID.isin(df2.Star_ID)]

Out[7]:

              Star_ID  Loc_ID  pmRA  pmDE  Field    Jmag    Hmag
2  2M00000222+5625359    4264     0     0  N7789  11.982  11.433
3  2M00000818+5634264    4264     0     0  N7789  12.501  11.892

[2 rows x 7 columns]
```
So what we do here is we create a boolean mask, we ask for where Star_ID values is in both dataframes, however by using the ~ we NOT the condition which in effect negates it. The one you linked to is pretty much the same thing but I think you maybe didn't understand the syntax?

EDIT

In order to get both values that are only in df1 and values that are only in df2 you could do this
```
unique_vals = df1[~df1.Star_ID.isin(df2.Star_ID)].append(df2[~df2.Star_ID.isin(df1.Star_ID)], ignore_index=True)
```
Further edit

So the problem was that the csv had leading spaces, this caused all values to be unique in both datasets, to correct this you need to do this:
```
df1.Apogee_ID = df1.Apogee_ID.str.lstrip()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...