I have a data frame and I would like to group it by a particular column (or, in other words, by values from a particular column). I can do it in the following way: gro
I think the issue is that there are two different first methods which share a name but act differently, one is for groupby objects and another for a Series/DataFrame (to do with timeseries).
To replicate the behaviour of the groupby first method over a DataFrame using agg you could use iloc[0] (which gets the first row in each group (DataFrame/Series) by index):
grouped.agg(lambda x: x.iloc[0])
For example:
In [1]: df = pd.DataFrame([[1, 2], [3, 4]])
In [2]: g = df.groupby(0)
In [3]: g.first()
Out[3]:
1
0
1 2
3 4
In [4]: g.agg(lambda x: x.iloc[0])
Out[4]:
1
0
1 2
3 4
Analogously you can replicate last using iloc[-1].
Note: This will works column-wise, et al:
g.agg({1: lambda x: x.iloc[0]})
In older version of pandas you could would use the irow method (e.g. x.irow(0), see previous edits.
A couple of updated notes:
This is better done using the nth groupby method, which is much faster >=0.13:
g.nth(0) # first
g.nth(-1) # last
You have to take care a little, as the default behaviour for first and last ignores NaN rows... and IIRC for DataFrame groupbys it was broken pre-0.13... there's a dropna option for nth.
You can use the strings rather than built-ins (though IIRC pandas spots it's the sum builtin and applies np.sum):
grouped['D'].agg({'result1' : "sum", 'result2' : "mean"})