How do I drop duplicates and keep the first value on pandas?

后端 未结 3 698
野性不改
野性不改 2021-01-27 11:24

I want to drop duplicates and keep the first value. The duplicates that want to be dropped is A = \'df\' .Here\'s my data

A   B   C   D   E
qw  1   3   1   1
er          


        
3条回答
  •  無奈伤痛
    2021-01-27 11:35

    Using cumcount()

    import pandas as pd
    import numpy as np
    df['cum'] = df.groupby(['A']).cumcount()
    df['cum2'] = np.append([0],np.diff(df.cum))
    df.query("~((A == 'df') & (cum2 == 1))").drop(['cum','cum2'],axis=1)
    

    df looks like:

    In [6]: df
    Out[6]: 
         A   B   C   D   E  cum
    0   qw   1   3   1   1    0
    1   er   2   4   2   6    0
    2   ew   4   8  44   4    0
    3   df  34  34  34  34    0
    4   df   2   5   2   2    1
    5   df   3   3   7   3    2
    6   df   4   4   7   4    3
    7   we   2   5   5   2    0
    8   we   4   4   4   4    1
    9   df  34   9  34  34    4
    10  df   3   3   9   3    5
    11  we   4   7   4   4    2
    12  qw   2   2   7   2    1
    

    np.diff

    In [7]: df['cum2'] = np.append([0],np.diff(df.cum))
    
    In [8]: df
    Out[8]: 
         A   B   C   D   E  cum  cum2
    0   qw   1   3   1   1    0     0
    1   er   2   4   2   6    0     0
    2   ew   4   8  44   4    0     0
    3   df  34  34  34  34    0     0
    4   df   2   5   2   2    1     1
    5   df   3   3   7   3    2     1
    6   df   4   4   7   4    3     1
    7   we   2   5   5   2    0    -3
    8   we   4   4   4   4    1     1
    9   df  34   9  34  34    4     3
    10  df   3   3   9   3    5     1
    11  we   4   7   4   4    2    -3
    12  qw   2   2   7   2    1    -1
    

    output

    In [12]: df.query("~((A == 'df') & (cum2 == 1))").drop(['cum','cum2'],axis=1)
    Out[12]: 
         A   B   C   D   E
    0   qw   1   3   1   1
    1   er   2   4   2   6
    2   ew   4   8  44   4
    3   df  34  34  34  34
    7   we   2   5   5   2
    8   we   4   4   4   4
    9   df  34   9  34  34
    11  we   4   7   4   4
    12  qw   2   2   7   2
    

    reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.cumcount.html

提交回复
热议问题