Grouping, splitting and picking top rows in a dataframe

折月煮酒 提交于 2021-01-04 05:58:54

问题


Problem

In the following dataframe df:

import random
import pandas as pd
random.seed(999)
sz = 50

qty = {'one': 1, 'two': 2, 'three': 3}

thing = (random.choice(['one', 'two', 'three']) for _ in range(sz))
order = (random.choice(['ascending', 'descending']) for _ in range(sz))
value = (random.randint(0, 100) for _ in range(sz))

df = pd.DataFrame({'thing': thing, 'order': order, 'value': value})

... I would like to:

  1. Group by thing
  2. Split it by order
  3. Sort it by value for the thing per its order
  4. Pick up the top qty for that thing

Expected Result

    thing       order  value
0     one   ascending     17
1     one  descending      1
2     two   ascending     28
3     two   ascending     30
4     two  descending     13
5     two  descending     38
6   three   ascending      6
7   three   ascending     27
8   three   ascending     35
9   three  descending      4
10  three  descending      5
11  three  descending      6

Manually coded to get the result by:

one_a = df[(df.thing == 'one') & (df.order == 'ascending')].reset_index(drop=True).sort_values('value', ascending='True').head(qty['one'])
one_d = df[(df.thing == 'one') & (df.order == 'descending')].reset_index(drop=True).sort_values('value', ascending='False').head(qty['one'])
two_a = df[(df.thing == 'two') & (df.order == 'ascending')].reset_index(drop=True).sort_values('value', ascending='True').head(qty['two'])
two_d = df[(df.thing == 'two') & (df.order == 'descending')].reset_index(drop=True).sort_values('value', ascending='False').head(qty['two'])
three_a = df[(df.thing == 'three') & (df.order == 'ascending')].reset_index(drop=True).sort_values('value', ascending='True').head(qty['three'])
three_d = df[(df.thing == 'three') & (df.order == 'descending')].reset_index(drop=True).sort_values('value', ascending='False').head(qty['three'])

print(pd.concat([one_a, one_d, two_a, two_d, three_a, three_d], ignore_index=True))

Question

Is it possible to achieve this using groupby, sort_values and set_index?


回答1:


One problem is to select ascending and descending separately. we can go around that by inverting descending:

df.loc[df.order=='descending','value']*= -1

s=(df.sort_values('value').groupby(['thing','order'])
     .cumcount()
     .reindex(df.index)
  )

out = df[s<df['thing'].map(qty)].sort_values(['thing','order'])
out.loc[out.order=='descending', 'value'] *= 1

Output:

    thing       order  value
14    one   ascending     17
27    one  descending      1
13  three   ascending      6
17  three   ascending     35
38  three   ascending     27
4   three  descending      5
23  three  descending      4
37  three  descending      6
21    two   ascending     28
42    two   ascending     30
6     two  descending     38
9     two  descending     13


来源:https://stackoverflow.com/questions/64864630/grouping-splitting-and-picking-top-rows-in-a-dataframe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!