问题
Problem
In the following dataframe df
:
import random
import pandas as pd
random.seed(999)
sz = 50
qty = {'one': 1, 'two': 2, 'three': 3}
thing = (random.choice(['one', 'two', 'three']) for _ in range(sz))
order = (random.choice(['ascending', 'descending']) for _ in range(sz))
value = (random.randint(0, 100) for _ in range(sz))
df = pd.DataFrame({'thing': thing, 'order': order, 'value': value})
... I would like to:
- Group by
thing
- Split it by
order
- Sort it by
value
for thething
per itsorder
- Pick up the top
qty
for thatthing
Expected Result
thing order value
0 one ascending 17
1 one descending 1
2 two ascending 28
3 two ascending 30
4 two descending 13
5 two descending 38
6 three ascending 6
7 three ascending 27
8 three ascending 35
9 three descending 4
10 three descending 5
11 three descending 6
Manually coded to get the result by:
one_a = df[(df.thing == 'one') & (df.order == 'ascending')].reset_index(drop=True).sort_values('value', ascending='True').head(qty['one'])
one_d = df[(df.thing == 'one') & (df.order == 'descending')].reset_index(drop=True).sort_values('value', ascending='False').head(qty['one'])
two_a = df[(df.thing == 'two') & (df.order == 'ascending')].reset_index(drop=True).sort_values('value', ascending='True').head(qty['two'])
two_d = df[(df.thing == 'two') & (df.order == 'descending')].reset_index(drop=True).sort_values('value', ascending='False').head(qty['two'])
three_a = df[(df.thing == 'three') & (df.order == 'ascending')].reset_index(drop=True).sort_values('value', ascending='True').head(qty['three'])
three_d = df[(df.thing == 'three') & (df.order == 'descending')].reset_index(drop=True).sort_values('value', ascending='False').head(qty['three'])
print(pd.concat([one_a, one_d, two_a, two_d, three_a, three_d], ignore_index=True))
Question
Is it possible to achieve this using groupby
, sort_values
and set_index
?
回答1:
One problem is to select ascending
and descending
separately. we can go around that by inverting descending
:
df.loc[df.order=='descending','value']*= -1
s=(df.sort_values('value').groupby(['thing','order'])
.cumcount()
.reindex(df.index)
)
out = df[s<df['thing'].map(qty)].sort_values(['thing','order'])
out.loc[out.order=='descending', 'value'] *= 1
Output:
thing order value
14 one ascending 17
27 one descending 1
13 three ascending 6
17 three ascending 35
38 three ascending 27
4 three descending 5
23 three descending 4
37 three descending 6
21 two ascending 28
42 two ascending 30
6 two descending 38
9 two descending 13
来源:https://stackoverflow.com/questions/64864630/grouping-splitting-and-picking-top-rows-in-a-dataframe