Remove rows with duplicate indices (Pandas DataFrame and TimeSeries)

后端未结

关注

 6  912

我寻月下人不归 2020-11-22 14:31

I\'m reading some automated weather data from the web. The observations occur every 5 minutes and are compiled into monthly files for each weather station. Once I\'m done pa

6条回答

春和景丽 (楼主)

2020-11-22 15:19

Remove duplicates (Keeping First)

idx = np.unique( df.index.values, return_index = True )[1]
df = df.iloc[idx]

Remove duplicates (Keeping Last)

df = df[::-1]
df = df.iloc[ np.unique( df.index.values, return_index = True )[1] ]

Tests: 10k loops using OP's data

numpy method - 3.03 seconds
df.loc[~df.index.duplicated(keep='first')] - 4.43 seconds
df.groupby(df.index).first() - 21 seconds
reset_index() method - 29 seconds

0 讨论(0)

查看其它6个回答