Remove rows with duplicate indices (Pandas DataFrame and TimeSeries)

后端 未结 6 912
我寻月下人不归
我寻月下人不归 2020-11-22 14:31

I\'m reading some automated weather data from the web. The observations occur every 5 minutes and are compiled into monthly files for each weather station. Once I\'m done pa

6条回答
  •  春和景丽
    2020-11-22 15:19

    Remove duplicates (Keeping First)

    idx = np.unique( df.index.values, return_index = True )[1]
    df = df.iloc[idx]
    

    Remove duplicates (Keeping Last)

    df = df[::-1]
    df = df.iloc[ np.unique( df.index.values, return_index = True )[1] ]
    

    Tests: 10k loops using OP's data

    numpy method - 3.03 seconds
    df.loc[~df.index.duplicated(keep='first')] - 4.43 seconds
    df.groupby(df.index).first() - 21 seconds
    reset_index() method - 29 seconds
    

提交回复
热议问题