问题
I have this dataframe:
person_code type growth size ...
0 . 231 32 0.54 32
1 . 233 43 0.12 333
2 . 432 32 0.44 21
3 . 431 56 0.32 23
4 . 654 89 0.12 89
5 . 764 32 0.20 211
6 . 434 32 0.82 90
...
(This dataframe is pretty big, I made a simplification here)
I want to create one dataframe for each type with the 3 persons with higher "growth", ordered by it. I want to be able to call it by type. In this case, let's use the type 32, so the output df should look something like this:
person_code type growth size ...
6 . 434 32 0.82 90
0 . 231 32 0.54 32
2 . 432 32 0.44 21
...
I understand that it would be something using groupby:
groups=dataframe.groupby('type')
But how could I call the groupby object with the rows where type is 32? And what would be the best what to separate only the top 3 by growth?
回答1:
IIUC, you don't need a groupby, just query
to filter the dataframe then nlargest:
df.query('type == 32').nlargest(3, 'growth')
And, to parameterize 'type' input, you can use this syntax:
in_type = 32
df.query('type == @in_type').nlargest(3, 'growth')
Output:
person_code type growth size
6 . 434 32 0.82 90
0 . 231 32 0.54 32
2 . 432 32 0.44 21
Or if you want to use groupby, you can use query to get only the types you need.
type_group_df = df.groupby('type', group_keys=False)\
.apply(pd.DataFrame.nlargest,n=3,columns='growth')
To call it, you can use:
type_group_df.query('type == 32')
If you've got a string as type it would look like this:
type_group_df.query('type == "brazilian"')
However, if by any chance your column name start with special characters, such as '#', you should use this:
type_group_df[type_group_df['#type'] == 32]
Output:
person_code type growth size
6 . 434 32 0.82 90
0 . 231 32 0.54 32
2 . 432 32 0.44 21
Query another type (43):
type_group_df.query('type == 43')
Output:
person_code type growth size
1 . 233 43 0.12 333
回答2:
You can do this for all of the type
s at the same time:
df.groupby('type').apply(lambda dft: dft.nlargest(3, 'growth'))
returns
person_code type growth size
type
32 6 434 32 0.82 90
0 231 32 0.54 32
2 432 32 0.44 21
43 1 233 43 0.12 333
56 3 431 56 0.32 23
89 4 654 89 0.12 89
回答3:
Something like ?
df.sort_values(['type','person_code']).groupby('type').head(3)
Out[184]:
person_code type growth size
0 231 32 0.54 32
2 432 32 0.44 21
6 434 32 0.82 90
1 233 43 0.12 333
3 431 56 0.32 23
4 654 89 0.12 89
回答4:
Find the indices of the top 3 growth values for each group and feed the level-1 indices into .loc
.
idx = df.groupby("type")["growth"].nlargest(3).index
# MultiIndex(levels=[[32, 43, 56, 89], [0, 1, 2, 3, 4, 6]],
# labels=[[0, 0, 0, 1, 2, 3], [5, 0, 2, 1, 3, 4]],
# names=['type', None])
dftop3 = df.loc[idx.get_level_values(1)]
person_code type growth size
6 434 32 0.82 90
0 231 32 0.54 32
2 432 32 0.44 21
1 233 43 0.12 333
3 431 56 0.32 23
4 654 89 0.12 89
dftop3[dftop3.type == 32]
person_code type growth size
6 434 32 0.82 90
0 231 32 0.54 32
2 432 32 0.44 21
来源:https://stackoverflow.com/questions/49101521/using-groupby-in-pandas-to-get-the-top-3-rows-by-column-value