问题

Problem

In the following dataframe df:

import random
import pandas as pd
random.seed(999)
sz = 50

qty = {'one': 1, 'two': 2, 'three': 3}

thing = (random.choice(['one', 'two', 'three']) for _ in range(sz))
order = (random.choice(['ascending', 'descending']) for _ in range(sz))
value = (random.randint(0, 100) for _ in range(sz))

df = pd.DataFrame({'thing': thing, 'order': order, 'value': value})

... I would like to:

Group by thing
Split it by order
Sort it by value for the thing per its order
Pick up the top qty for that thing

Expected Result

    thing       order  value
0     one   ascending     17
1     one  descending      1
2     two   ascending     28
3     two   ascending     30
4     two  descending     13
5     two  descending     38
6   three   ascending      6
7   three   ascending     27
8   three   ascending     35
9   three  descending      4
10  three  descending      5
11  three  descending      6

Manually coded to get the result by:

one_a = df[(df.thing == 'one') & (df.order == 'ascending')].reset_index(drop=True).sort_values('value', ascending='True').head(qty['one'])
one_d = df[(df.thing == 'one') & (df.order == 'descending')].reset_index(drop=True).sort_values('value', ascending='False').head(qty['one'])
two_a = df[(df.thing == 'two') & (df.order == 'ascending')].reset_index(drop=True).sort_values('value', ascending='True').head(qty['two'])
two_d = df[(df.thing == 'two') & (df.order == 'descending')].reset_index(drop=True).sort_values('value', ascending='False').head(qty['two'])
three_a = df[(df.thing == 'three') & (df.order == 'ascending')].reset_index(drop=True).sort_values('value', ascending='True').head(qty['three'])
three_d = df[(df.thing == 'three') & (df.order == 'descending')].reset_index(drop=True).sort_values('value', ascending='False').head(qty['three'])

print(pd.concat([one_a, one_d, two_a, two_d, three_a, three_d], ignore_index=True))

Question

Is it possible to achieve this using groupby, sort_values and set_index?

回答1:

One problem is to select ascending and descending separately. we can go around that by inverting descending:

df.loc[df.order=='descending','value']*= -1

s=(df.sort_values('value').groupby(['thing','order'])
     .cumcount()
     .reindex(df.index)
  )

out = df[s<df['thing'].map(qty)].sort_values(['thing','order'])
out.loc[out.order=='descending', 'value'] *= 1

Output:

    thing       order  value
14    one   ascending     17
27    one  descending      1
13  three   ascending      6
17  three   ascending     35
38  three   ascending     27
4   three  descending      5
23  three  descending      4
37  three  descending      6
21    two   ascending     28
42    two   ascending     30
6     two  descending     38
9     two  descending     13

来源：https://stackoverflow.com/questions/64864630/grouping-splitting-and-picking-top-rows-in-a-dataframe

标签

pandas

sorting

Grouping, splitting and picking top rows in a dataframe

问题

Problem

Expected Result

Manually coded to get the result by:

Question

回答1: