I'm taking a Data Mining course at university right now, but I'm a wee bit stuck on a multi-index sorting problem.
The actual data involves about 1 million reviews of movies, and I'm trying to analyze that based on American zip codes, but to test out how to do what I want, I've been using a much smaller data set of 250 randomly generated ratings for 10 movies and instead of zip codes, I'm using age groups.
So this is what I have right now, it's a multiindexed DataFrame in Pandas with two levels, 'group' and 'title'
rating group title Alien 4.000000 Argo 2.166667 Adults Ben-Hur 3.666667 Gandhi 3.200000 ... ... Alien 3.000000 Argo 3.750000 Coeds Ben-Hur 3.000000 Gandhi 2.833333 ... ... Alien 2.500000 Argo 2.750000 Kids Ben-Hur 3.000000 Gandhi 3.200000 ... ...
What I'm aiming for is to sort the titles based on their rating within the group (and only show the most popular 5 or so titles within each group)
So something like this (but I'm only going to show two titles in each group):
rating group title Alien 4.000000 Adults Ben-Hur 3.666667 Argo 3.750000 Coeds Alien 3.000000 Gandhi 3.200000 Kids Ben-Hur 3.000000
Anyone know how to do this? I've tried sort_order, sort_index, etc and swapping the levels, but they mix up the groups too. So it then looks like:
rating group title Adults Alien 4.000000 Coeds Argo 3.750000 Adults Ben-Hur 3.666667 Kids Gandhi 3.666667 Coeds Alien 3.000000 Kids Ben-Hur 3.000000
I'm kind of looking for something like this: Multi-Index Sorting in Pandas, but instead of sorting based on another level, I want to sort based on the values. Kind of like if that person wanted to sort based on his sales column.
Thanks!