I\'m reading some automated weather data from the web. The observations occur every 5 minutes and are compiled into monthly files for each weather station. Once I\'m done pa
Remove duplicates (Keeping First)
idx = np.unique( df.index.values, return_index = True )[1]
df = df.iloc[idx]
Remove duplicates (Keeping Last)
df = df[::-1]
df = df.iloc[ np.unique( df.index.values, return_index = True )[1] ]
Tests: 10k loops using OP's data
numpy method - 3.03 seconds
df.loc[~df.index.duplicated(keep='first')] - 4.43 seconds
df.groupby(df.index).first() - 21 seconds
reset_index() method - 29 seconds