Pandas/Python: How to concatenate two dataframes without duplicates?

前端 未结 3 1102
执念已碎
执念已碎 2020-11-28 23:33

I\'d like to concatenate two dataframes A, B to a new one without duplicate rows (if rows in B already exist in A, don\'t add):

Dataframe A: Dataframe B:



        
3条回答
  •  甜味超标
    2020-11-29 00:29

    In case you have a duplicate row already in DataFrame A, then concatenating and then dropping duplicate rows, will remove rows from DataFrame A that you might want to keep.

    In this case, you will need to create a new column with a cumulative count, and then drop duplicates, it all depends on your use case, but this is common in time-series data

    Here is an example:

    df_1 = pd.DataFrame([
    {'date':'11/20/2015', 'id':4, 'value':24},
    {'date':'11/20/2015', 'id':4, 'value':24},
    {'date':'11/20/2015', 'id':6, 'value':34},])
    
    df_2 = pd.DataFrame([
    {'date':'11/20/2015', 'id':4, 'value':24},
    {'date':'11/20/2015', 'id':6, 'value':14},
    ])
    
    
    df_1['count'] = df_1.groupby(['date','id','value']).cumcount()
    df_2['count'] = df_2.groupby(['date','id','value']).cumcount()
    
    df_tot = pd.concat([df_1,df_2], ignore_index=False)
    df_tot = df_tot.drop_duplicates()
    df_tot = df_tot.drop(['count'], axis=1)
    >>> df_tot
    
    date    id  value
    0   11/20/2015  4   24
    1   11/20/2015  4   24
    2   11/20/2015  6   34
    1   11/20/2015  6   14
    

提交回复
热议问题