Python - Drop duplicate based on max value of a column

后端 未结 1 1526
太阳男子
太阳男子 2020-12-11 06:46

I am not really good with pandas, and I think pandas should solve my problem: I have a text file, that contains data (id1;id2;value1;<

相关标签:
1条回答
  • 2020-12-11 06:54

    You need DataFrameGroupBy.idxmax for indexes of max value of value3 and thes select DataFrame by loc:

    print (df.groupby(['id1','id2','value1']).value3.idxmax())
    id1  id2  value1
    1    2    30        1
    3    5    12        4
    24   12   1         6
    Name: value3, dtype: int64
    
    df = df.loc[df.groupby(['id1','id2','value1']).value3.idxmax()]
    print (df)
       id1  id2  value1  value2  value3   a
    1    1    2      30      42    26.2 NaN
    4    3    5      12      33    11.2 NaN
    6   24   12       1      23     1.9 NaN
    

    Another possible solution is sort_values by column value3 and then groupby with GroupBy.first:

    df = df.sort_values('value3', ascending=False)
           .groupby(['id1','id2','value1'], sort=False)
           .first()
           .reset_index()
    print (df)
       id1  id2  value1  value2  value3   a
    0    1    2      30      42    26.2 NaN
    1    3    5      12      33    11.2 NaN
    2   24   12       1      23     1.9 NaN
    
    0 讨论(0)
提交回复
热议问题