问题
I'm trying to create a simple pivot table with subtotals, excel-style, however I can't find a method using Pandas. I've tried the solution Wes suggested in another subtotal-related question, however that doesn't give the expected results. Below the steps to reproduce it:
Create the sample data:
sample_data = {'customer': ['A', 'A', 'A', 'B', 'B', 'B', 'A', 'A', 'A', 'B', 'B', 'B'], 'product': ['astro','ball','car','astro','ball', 'car', 'astro', 'ball', 'car','astro','ball','car'],
'week': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
'qty': [10, 15, 20, 40, 20, 34, 300, 20, 304, 23, 45, 23]}
df = pd.DataFrame(sample_data)
create the pivot table with margins (it only has total, not subtotal by customer (A, B))
piv = df.pivot_table(index=['customer','product'],columns='week',values='qty',margins=True,aggfunc=np.sum)
week 1 2 All
customer product
A astro 10 300 310
ball 15 20 35
car 20 304 324
B astro 40 23 63
ball 20 45 65
car 34 23 57
All 139 715 854
Then, I tried the method Wes Mckiney mentioned in another thread, using the stack function:
piv2 = df.pivot_table(index='customer',columns=['week','product'],values='qty',margins=True,aggfunc=np.sum)
piv2.stack('product')
The result has the format I want, but the rows with the "All" doesn't have the sum:
week 1 2 All
customer product
A NaN NaN 669.0
astro 10.0 300.0 NaN
ball 15.0 20.0 NaN
car 20.0 304.0 NaN
B NaN NaN 185.0
astro 40.0 23.0 NaN
ball 20.0 45.0 NaN
car 34.0 23.0 NaN
All NaN NaN 854.0
astro 50.0 323.0 NaN
ball 35.0 65.0 NaN
car 54.0 327.0 NaN
how to make it work as it would in Excel, sample below? with all the subtotals and totals working? what am I missing? ed excel sample
just to point, I am able to make it work using For loops filtering by the customer on each iteration and concat later, but I hope there might be a more direct solution thank you
回答1:
You can do it one step, but you have to be strategic about index name due to alphabetical sorting:
piv = df.pivot_table(index=['customer','product'],
columns='week',
values='qty',
margins=True,
margins_name='Total',
aggfunc=np.sum)
(pd.concat([piv,
piv.query('customer != "Total"')
.sum(level=0)
.assign(product='total')
.set_index('product', append=True)])
.sort_index())
Output:
week 1 2 Total
customer product
A astro 10 300 310
ball 15 20 35
car 20 304 324
total 45 624 669
B astro 40 23 63
ball 20 45 65
car 34 23 57
total 94 91 185
Total 139 715 854
回答2:
@Scott Boston's answer is perfect and elegant. For reference, if you group just the customers and pd.concat()
the results are We get the following results.
piv = df.pivot_table(index=['customer','product'],columns='week',values='qty',margins=True,aggfunc=np.sum)
piv3 = df.pivot_table(index=['customer'],columns='week',values='qty',margins=True,aggfunc=np.sum)
piv4 = pd.concat([piv, piv3], axis=0)
piv4
week 1 2 All
(A, astro) 10 300 310
(A, ball) 15 20 35
(A, car) 20 304 324
(B, astro) 40 23 63
(B, ball) 20 45 65
(B, car) 34 23 57
(All, ) 139 715 854
A 45 624 669
B 94 91 185
All 139 715 854
来源:https://stackoverflow.com/questions/62605470/pandas-pivot-table-subtotals-with-multi-index