Selecting Unique Rows between Two DataFrames in Pandas

前端 未结 1 1683
再見小時候
再見小時候 2020-12-11 06:56

I have two data frames A and B of unequal dimensions. I would like to create a data frame C such that it ONLY contains rows that are unique between A and B. I tried to follo

相关标签:
1条回答
  • 2020-12-11 07:35

    This worked for me:

    In [7]:
    
    df1[~df1.Star_ID.isin(df2.Star_ID)]
    
    Out[7]:
    
                  Star_ID  Loc_ID  pmRA  pmDE  Field    Jmag    Hmag
    2  2M00000222+5625359    4264     0     0  N7789  11.982  11.433
    3  2M00000818+5634264    4264     0     0  N7789  12.501  11.892
    
    [2 rows x 7 columns]
    

    So what we do here is we create a boolean mask, we ask for where Star_ID values is in both dataframes, however by using the ~ we NOT the condition which in effect negates it. The one you linked to is pretty much the same thing but I think you maybe didn't understand the syntax?

    EDIT

    In order to get both values that are only in df1 and values that are only in df2 you could do this

    unique_vals = df1[~df1.Star_ID.isin(df2.Star_ID)].append(df2[~df2.Star_ID.isin(df1.Star_ID)], ignore_index=True)
    

    Further edit

    So the problem was that the csv had leading spaces, this caused all values to be unique in both datasets, to correct this you need to do this:

    df1.Apogee_ID = df1.Apogee_ID.str.lstrip()
    
    0 讨论(0)
提交回复
热议问题