Python pandas groupby aggregate on multiple columns, then pivot

匿名 (未验证) 提交于 2019-12-03 01:27:01

问题:

In Python, I have a pandas DataFrame similar to the following:

Item | shop1 | shop2 | shop3 | Category ------------------------------------ Shoes| 45    | 50    | 53    | Clothes TV   | 200   | 300   | 250   | Technology Book | 20    | 17    | 21    | Books phone| 300   | 350   | 400   | Technology 

Where shop1, shop2 and shop3 are the costs of every item in different shops. Now, I need to return a DataFrame, after some data cleaning, like this one:

Category (index)| size| sum| mean | std ---------------------------------------- 

where size is the number of items in each Category and sum, mean and std are related to the same functions applied to the 3 shops. How can I do these operations with the split-apply-combine pattern (groupby, aggregate, apply,...) ?

Can someone help me out? I'm going crazy with this one...thank you!

回答1:

option 1
use agg ← link to docs

agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std') df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs)                    Std   Sum        Mean  Size Category                                      Books        2.081666    58   19.333333     3 Clothes      4.041452   148   49.333333     3 Technology  70.710678  1800  300.000000     6 

option 2
more for less
use describe ← link to docs

df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack()              count        mean        std    min    25%    50%    75%    max Category                                                                    Books         3.0   19.333333   2.081666   17.0   18.5   20.0   20.5   21.0 Clothes       3.0   49.333333   4.041452   45.0   47.5   50.0   51.5   53.0 Technology    6.0  300.000000  70.710678  200.0  262.5  300.0  337.5  400.0 


回答2:

df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']}) 

Or if you want it across all shops then:

df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'}) df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']}) 


回答3:

If I understand correctly, you want to calculate aggregate metrics for all shops, not for each individually. To do that, you can first stack your dataframe and then group by Category:

stacked = df.set_index(['Item', 'Category']).stack().reset_index() stacked.columns = ['Item', 'Category', 'Shop', 'Price'] stacked.groupby('Category').agg({'Price':['count','sum','mean','std']}) 

Which results in

           Price                                         count   sum        mean        std Category                                      Books          3    58   19.333333   2.081666 Clothes        3   148   49.333333   4.041452 Technology     6  1800  300.000000  70.710678 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!