Pandas group by with multiple columns and max value

落爺英雄遲暮 提交于 2020-12-13 07:19:05

问题


I have some problems with group by with multiple columns and max value.

A   B   C   D   E   F   G   H

x   q   e   m   k   2   1   y
x   q   e   n   l   5   2   y
x   w   e   b   j   7   3   y
x   w   e   v   h   3   4   y

This query is correct and returning what I want.

SELECT A, B, C, D, E, MAX(F) FROM mytable group by A, B, C

Results

 x   q   e   n   l   5
 x   w   e   b   j   7

How it can be achieved in pandas?

I try this:

df.groupby(['A', 'B', 'C'], as_index=False)['F'].max()

And this translates to this:

SELECT A, B, C, MAX(F) FROM mytable group by A, B, C

This also does not work

df.groupby(['A', 'B', 'C'], as_index=False)['F','D','E'].max()

How can I return also column D, E as it in sql query?


回答1:


Seems like you need

groups = ['A', 'B', 'C']
selects = ['A', 'B', 'C','D', 'E','F']

df.groupby(groups, as_index=False).apply(lambda s: s.loc[s.F.idxmax(), selects]).reset_index(drop=True)

    A   B   C   D   E   F
0   x   q   e   n   l   5
1   x   w   e   b   j   7



回答2:


Try something like this:

df.groupby(['A', 'B', 'C'], as_index=False).agg({'D': 'first', 'E': 'last', 'F': 'max'})


来源:https://stackoverflow.com/questions/52457014/pandas-group-by-with-multiple-columns-and-max-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!