Anti-Join Pandas

前端 未结 6 1083
梦谈多话
梦谈多话 2020-12-25 10:36

I have two tables and I would like to append them so that only all the data in table A is retained and data from table B is only added if its key is unique (Key values are u

6条回答
  •  不思量自难忘°
    2020-12-25 11:29

    Consider the following dataframes

    TableA = pd.DataFrame(np.random.rand(4, 3),
                          pd.Index(list('abcd'), name='Key'),
                          ['A', 'B', 'C']).reset_index()
    TableB = pd.DataFrame(np.random.rand(4, 3),
                          pd.Index(list('aecf'), name='Key'),
                          ['A', 'B', 'C']).reset_index()
    

    TableA
    


    TableB
    

    This is one way to do what you want

    Method 1

    # Identify what values are in TableB and not in TableA
    key_diff = set(TableB.Key).difference(TableA.Key)
    where_diff = TableB.Key.isin(key_diff)
    
    # Slice TableB accordingly and append to TableA
    TableA.append(TableB[where_diff], ignore_index=True)
    

    Method 2

    rows = []
    for i, row in TableB.iterrows():
        if row.Key not in TableA.Key.values:
            rows.append(row)
    
    pd.concat([TableA.T] + rows, axis=1).T
    

    Timing

    4 rows with 2 overlap

    Method 1 is much quicker

    10,000 rows 5,000 overlap

    loops are bad

提交回复
热议问题