pandas get rows which are NOT in other dataframe

后端 未结 13 1224
春和景丽
春和景丽 2020-11-22 02:17

I\'ve two pandas data frames which have some rows in common.

Suppose dataframe2 is a subset of dataframe1.

How can I get the rows of dataframe1 which

13条回答
  •  猫巷女王i
    2020-11-22 02:48

    Suppose you have two dataframes, df_1 and df_2 having multiple fields(column_names) and you want to find the only those entries in df_1 that are not in df_2 on the basis of some fields(e.g. fields_x, fields_y), follow the following steps.

    Step1.Add a column key1 and key2 to df_1 and df_2 respectively.

    Step2.Merge the dataframes as shown below. field_x and field_y are our desired columns.

    Step3.Select only those rows from df_1 where key1 is not equal to key2.

    Step4.Drop key1 and key2.

    This method will solve your problem and works fast even with big data sets. I have tried it for dataframes with more than 1,000,000 rows.

    df_1['key1'] = 1
    df_2['key2'] = 1
    df_1 = pd.merge(df_1, df_2, on=['field_x', 'field_y'], how = 'left')
    df_1 = df_1[~(df_1.key2 == df_1.key1)]
    df_1 = df_1.drop(['key1','key2'], axis=1)
    

提交回复
热议问题