Compare Python Pandas DataFrames for matching rows

前端 未结 2 1471
梦如初夏
梦如初夏 2020-11-27 14:07

I have this DataFrame (df1) in Pandas:

df1 = pd.DataFrame(np.random.rand(10,4),columns=list(\'ABCD\'))
print df1

       A         B         C           


        
2条回答
  •  悲哀的现实
    2020-11-27 15:04

    @Andrew: I believe I found a way to drop the rows of one dataframe that are already present in another (i.e. to answer my EDIT) without using loops - let me know if you disagree and/or if my OP + EDIT did not clearly state this:

    THIS WORKS

    The columns for both dataframes are always the same - A, B, C and D. With this in mind, based heavily on Andrew's approach, here is how to drop the rows from df2 that are also present in df1:

    common_cols = df1.columns.tolist()                         #generate list of column names
    df12 = pd.merge(df1, df2, on=common_cols, how='inner')     #extract common rows with merge
    df2 = df2[~df2['A'].isin(df12['A'])]
    

    Line 3 does the following:

    • Extract only rows from df2 that do not match rows in df1:
    • In order for 2 rows to be different, ANY one column of one row must
      necessarily be different that the corresponding column in another row.
    • Here, I picked column A to make this comparison - it is
      possible to use any of the column names, but not ALL of the
      column names.

    NOTE: this method is essentially the equivalent of the SQL NOT IN().

提交回复
热议问题