问题
I'm posting this because the topic just got brought up in another question/answer and the behavior isn't very well documented.
Consider the dataframe df
df = pd.DataFrame(dict(
A=list('xxxyyy'),
B=[np.nan, 1, 2, 3, 4, np.nan]
))
A B
0 x NaN
1 x 1.0
2 x 2.0
3 y 3.0
4 y 4.0
5 y NaN
I wanted to get the first and last rows of each group defined by column 'A'
.
I tried
df.groupby('A').B.agg(['first', 'last'])
first last
A
x 1.0 2.0
y 3.0 4.0
However, This doesn't give me the np.NaN
s that I expected.
How do I get the actual first and last values in each group?
回答1:
One option is to use the .nth
method:
>>> gb = df.groupby('A')
>>> gb.nth(0)
B
A
x NaN
y 3.0
>>> gb.nth(-1)
B
A
x 2.0
y NaN
>>>
However, I haven't found a way to aggregate them neatly. Of course, one can always use a pd.DataFrame
constructor:
>>> pd.DataFrame({'first':gb.B.nth(0), 'last':gb.B.nth(-1)})
first last
A
x NaN 2.0
y 3.0 NaN
Note: I explicitly used the gb.B
attribute, or else you have to use .squeeze
回答2:
As noted here by @unutbu:
The groupby.first and groupby.last methods return the first and last non-null values respectively.
To get the actual first and last values, do:
def h(x):
return x.values[0]
def t(x):
return x.values[-1]
df.groupby('A').B.agg([h, t])
h t
A
x NaN 2.0
y 3.0 NaN
来源:https://stackoverflow.com/questions/45744800/why-doesnt-first-and-last-in-a-groupby-give-me-first-and-last