可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want to group my dataframe by two columns and then sort the aggregated results within the groups.

In [167]: df  Out[167]: count   job source 0   2   sales   A 1   4   sales   B 2   6   sales   C 3   3   sales   D 4   7   sales   E 5   5   market  A 6   3   market  B 7   2   market  C 8   4   market  D 9   1   market  E  In [168]: df.groupby(['job','source']).agg({'count':sum})  Out[168]:             count job     source   market  A   5         B   3         C   2         D   4         E   1 sales   A   2         B   4         C   6         D   3         E   7

I would now like to sort the count column in descending order within each of the groups. And then take only the top three rows. To get something like:

            count job     source   market  A   5         D   4         B   3 sales   E   7         C   6         B   4

回答1:

What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.

Starting from the result of the first groupby:

In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

We group by the first level of the index:

In [63]: g = df_agg['count'].groupby(level=0, group_keys=False)

Then we want to sort ('order') each group and take the first three elements:

In [64]: res = g.apply(lambda x: x.order(ascending=False).head(3))

However, for this, there is a shortcut function to do this, nlargest:

In [65]: g.nlargest(3) Out[65]: job     source market  A         5         D         4         B         3 sales   E         7         C         6         B         4 dtype: int64

回答2:

You could also just do it in one go, by doing the sort first and using head to take the first 3 of each group.

In[34]: df.sort_values(['job','count'],ascending=False).groupby('job').head(3)  Out[35]:     count     job source 4      7   sales      E 2      6   sales      C 1      4   sales      B 5      5  market      A 8      4  market      D 6      3  market      B

回答3:

Here's other example of taking top 3 on sorted order, and sorting within the groups:

In [43]: import pandas as pd                                                                                                                                                         In [44]:  df = pd.DataFrame({"name":["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"], "count_1":[5,10,12,15,20,25,30,35], "count_2" :[100,150,100,25,250,300,400,500]})  In [45]: df                                                                                                                                                                         Out[45]:     count_1  count_2  name 0        5      100   Foo 1       10      150   Foo 2       12      100  Baar 3       15       25   Foo 4       20      250  Baar 5       25      300   Foo 6       30      400  Baar 7       35      500  Baar   ### Top 3 on sorted order: In [46]: df.groupby(["name"])["count_1"].nlargest(3)                                                                                                                                Out[46]:  name    Baar  7    35       6    30       4    20 Foo   5    25       3    15       1    10 dtype: int64   ### Sorting within groups based on column "count_1": In [48]: df.groupby(["name"]).apply(lambda x: x.sort_values(["count_1"], ascending = False)).reset_index(drop=True) Out[48]:     count_1  count_2  name 0       35      500  Baar 1       30      400  Baar 2       20      250  Baar 3       12      100  Baar 4       25      300   Foo 5       15       25   Foo 6       10      150   Foo 7        5      100   Foo

回答4:

If you don't need to sum a column, then use @tvashtar's answer. If you do need to sum, then you can use @joris' answer or this one which is very similar to it.

df.groupby(['job']).apply(lambda x: (x.groupby('source')                                       .sum()                                       .sort_values('count', ascending=False))                                      .head(3))

文章来源: pandas groupby sort within groups

标签

sort

pandas