问题
I have a dataframe that I group according to an id
-column. For each group I want to get the row (the whole row, not just the value) containing the max value. I am able to do this by first getting the max value for each group, then create a filter array and then apply the filter on the original dataframe. Like so,
import pandas as pd
# Dummy data
df = pd.DataFrame({'id' : [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4],
'other_value' : ['a', 'e', 'b', 'b', 'a', 'd', 'b', 'f' ,'a' ,'c', 'e', 'f'],
'value' : [1, 3, 5, 2, 5, 6, 2, 4, 6, 1, 7, 3]
})
# Get the max value in each group
df_max = df.groupby('id')['value'].max()
# Create row filter
row_filter = [df_max[i]==v for i, v in zip(df['id'], df['value'])]
# Filter
df_target = df[row_filter]
df_target
Out[58]:
id other_value value
2 1 b 5
5 2 d 6
7 3 f 4
10 4 e 7
This solution works, but somehow seems overly cumbersome. Does anybody know of a nicer way to do this. Preferably a oneliner. Regarding potential duplicates, I'll deal with those later :)
回答1:
Use DataFrameGroupBy.idxmax if need select only one max value:
df = df.loc[df.groupby('id')['value'].idxmax()]
print (df)
id other_value value
2 1 b 5
5 2 d 6
7 3 f 4
10 4 e 7
If multiple max values and want seelct all rows by max
values:
df = pd.DataFrame({'id' : [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4],
'other_value' : ['a', 'e', 'b', 'b', 'a', 'd', 'b', 'f' ,'a' ,'c', 'e', 'f'],
'value' : [1, 3, 5, 2, 5, 6, 2, 4, 6, 1, 7, 7]
})
print (df)
id other_value value
0 1 a 1
1 1 e 3
2 1 b 5
3 2 b 2
4 2 a 5
5 2 d 6
6 3 b 2
7 3 f 4
8 4 a 6
9 4 c 1
10 4 e 7
11 4 f 7
df = df[df.groupby('id')['value'].transform('max') == df['value']]
print (df)
id other_value value
2 1 b 5
5 2 d 6
7 3 f 4
10 4 e 7
11 4 f 7
来源:https://stackoverflow.com/questions/50815893/get-rows-with-largest-value-in-grouping