In Pandas How to sort one level of a multi-index based on the values of a column, while maintaining the grouping of the other level

匿名 (未验证) 提交于 2019-12-03 02:57:02

问题:

I'm taking a Data Mining course at university right now, but I'm a wee bit stuck on a multi-index sorting problem.

The actual data involves about 1 million reviews of movies, and I'm trying to analyze that based on American zip codes, but to test out how to do what I want, I've been using a much smaller data set of 250 randomly generated ratings for 10 movies and instead of zip codes, I'm using age groups.

So this is what I have right now, it's a multiindexed DataFrame in Pandas with two levels, 'group' and 'title'

                        rating group       title                Alien       4.000000             Argo        2.166667 Adults      Ben-Hur     3.666667             Gandhi      3.200000             ...         ...              Alien       3.000000             Argo        3.750000 Coeds       Ben-Hur     3.000000             Gandhi      2.833333             ...         ...              Alien       2.500000             Argo        2.750000 Kids        Ben-Hur     3.000000             Gandhi      3.200000             ...         ... 

What I'm aiming for is to sort the titles based on their rating within the group (and only show the most popular 5 or so titles within each group)

So something like this (but I'm only going to show two titles in each group):

                        rating group       title                Alien       4.000000 Adults      Ben-Hur     3.666667              Argo        3.750000 Coeds       Alien       3.000000              Gandhi      3.200000 Kids        Ben-Hur     3.000000 

Anyone know how to do this? I've tried sort_order, sort_index, etc and swapping the levels, but they mix up the groups too. So it then looks like:

                          rating group         title  Adults        Alien      4.000000 Coeds         Argo       3.750000 Adults        Ben-Hur    3.666667 Kids          Gandhi     3.666667 Coeds         Alien      3.000000 Kids          Ben-Hur    3.000000 

I'm kind of looking for something like this: Multi-Index Sorting in Pandas, but instead of sorting based on another level, I want to sort based on the values. Kind of like if that person wanted to sort based on his sales column.

Thanks!

回答1:

You're looking for sort:

In [11]: s = pd.Series([3, 1, 2], [[1, 1, 2], [1, 3, 1]])  In [12]: s.sort()  In [13]: s Out[13]:  1  3    1 2  1    2 1  1    3 dtype: int64 

Note; this works inplace (i.e. modifies s), to return a copy use order:

In [14]: s.order() Out[14]:  1  3    1 2  1    2 1  1    3 dtype: int64 

Update: I realised what you were actually asking, and I think this ought to be an option in sortlevels, but for now I think you have to reset_index, groupby and apply:

In [21]: s.reset_index(name='s').groupby('level_0').apply(lambda s: s.sort('s')).set_index(['level_0', 'level_1'])['s'] Out[21]:  level_0  level_1 1        3          1          1          3 2        1          2 Name: 0, dtype: int64 

Note: you can set the level names to [None, None] afterwards.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!