I have a dataframe which was created via a df.pivot:
type start end
F_Type to_date
A 2
The other ways to do that are as follow:
(
df
.groupby(level=0)
.apply(lambda group: group.nlargest(1, columns='to_date'))
.reset_index(level=-1, drop=True)
)
(
df
.groupby(level=0)
.apply(lambda group: group.loc[group['to_date'] == group['to_date'].max()])
.reset_index(level=-1, drop=True)
)
A standard approach is to use groupby(keys)[column].idxmax().
However, to select the desired rows using idxmax you need idxmax to return unique index values. One way to obtain a unique index is to call reset_index.
Once you obtain the index values from groupby(keys)[column].idxmax() you can then select the entire row using df.loc:
In [20]: df.loc[df.reset_index().groupby(['F_Type'])['to_date'].idxmax()]
Out[20]:
start end
F_Type to_date
A 20150908143000 345 316
B 20150908143000 10743 8803
C 20150908143000 19522 16659
D 20150908143000 433 65
E 20150908143000 7290 7375
F 20150908143000 0 0
G 20150908143000 1796 340
Note: idxmax returns index labels, not necessarily ordinals. After using reset_index the index labels happen to also be ordinals, but since idxmax is returning labels (not ordinals) it is better to always use idxmax in conjunction with df.loc, not df.iloc (as I originally did in this post.)