I have the following large dataframe (df
) that looks like this:
ID date PRICE
1 10001 19920103 14.500
2 10001 1
Old but still watched quite often: a much faster solution is nth(0) combined with drop duplicates:
def using_nth(df):
to_del = df.groupby('ID',as_index=False).nth(0)
return pd.concat([df,to_del]).drop_duplicates(keep=False)
In my system the times for unutbus setting are:
using_nth : 0.43
using_apply_alt : 1.93
using_mask : 2.11
using_apply : 4.33