可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
In Python, I have a pandas DataFrame similar to the following:
Item | shop1 | shop2 | shop3 | Category ------------------------------------ Shoes| 45 | 50 | 53 | Clothes TV | 200 | 300 | 250 | Technology Book | 20 | 17 | 21 | Books phone| 300 | 350 | 400 | Technology
Where shop1, shop2 and shop3 are the costs of every item in different shops. Now, I need to return a DataFrame, after some data cleaning, like this one:
Category (index)| size| sum| mean | std ----------------------------------------
where size is the number of items in each Category and sum, mean and std are related to the same functions applied to the 3 shops. How can I do these operations with the split-apply-combine pattern (groupby, aggregate, apply,...) ?
Can someone help me out? I'm going crazy with this one...thank you!
回答1:
option 1
use agg
← link to docs
agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std') df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs) Std Sum Mean Size Category Books 2.081666 58 19.333333 3 Clothes 4.041452 148 49.333333 3 Technology 70.710678 1800 300.000000 6
option 2
more for less
use describe
← link to docs
df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack() count mean std min 25% 50% 75% max Category Books 3.0 19.333333 2.081666 17.0 18.5 20.0 20.5 21.0 Clothes 3.0 49.333333 4.041452 45.0 47.5 50.0 51.5 53.0 Technology 6.0 300.000000 70.710678 200.0 262.5 300.0 337.5 400.0
回答2:
df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']})
Or if you want it across all shops then:
df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'}) df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']})
回答3:
If I understand correctly, you want to calculate aggregate metrics for all shops, not for each individually. To do that, you can first stack
your dataframe and then group by Category
:
stacked = df.set_index(['Item', 'Category']).stack().reset_index() stacked.columns = ['Item', 'Category', 'Shop', 'Price'] stacked.groupby('Category').agg({'Price':['count','sum','mean','std']})
Which results in
Price count sum mean std Category Books 3 58 19.333333 2.081666 Clothes 3 148 49.333333 4.041452 Technology 6 1800 300.000000 70.710678