I\'m going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Py
The original data is:
>>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])
>>> df
Out:
Graduate Undergraduate
Straight A's 60 240
Not 440 3760
You can only use df.T
to achieve recreating this table:
>>> df_new = df.T
>>> df_new
Out:
Straight A's Not
Graduate 60 440
Undergraduate 240 3760
After computing the Total
by row and columns:
>>> df_new.loc['Total',:]= df_new.sum(axis=0)
>>> df_new.loc[:,'Total'] = df_new.sum(axis=1)
>>> df_new
Out:
Straight A's Not Total
Graduate 60.0 440.0 500.0
Undergraduate 240.0 3760.0 4000.0
Total 300.0 4200.0 4500.0
From the original data using crosstab
, if just base on your input, you just need melt
before crosstab
s=df.reset_index().melt('index')
pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
Out[33]:
variable Graduate Undergraduate All
index
Not 440 3760 4200
Straight A's 60 240 300
All 500 4000 4500
Toy data
df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]})
# before `agg`, I think your input is the result after `groupby`
df
Out[37]:
c1 c2 c3
0 1 2 1
1 2 2 2
2 2 3 3
3 3 3 4
4 4 3 5
pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
=True)
Out[38]:
c2 2 3 All
c1
1 1.0 NaN 1
2 2.0 3.0 5
3 NaN 4.0 4
4 NaN 5.0 5
All 3.0 12.0 15
The point of this answer is to provide an in line and not an in place solution.
I use append
to stack a Series
or DataFrame
vertically. It also creates a copy
so that I can continue to chain.
I use assign
to add a column. However, the DataFrame
I'm working on is in the in between nether space. So I use a lambda
in the assign
argument which tells Pandas
to apply it to the calling DataFrame
.
df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))
Graduate Undergraduate Total
Not 440 3760 4200
Straight A's 60 240 300
Total 500 4000 4500
Uses drop
with errors='ignore'
to get rid of potentially pre-existing Total
rows and columns.
Also, still in line.
def tc(d):
return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))
df.pipe(tc).T.pipe(tc).T
Graduate Undergraduate Total
Not 440 3760 4200
Straight A's 60 240 300
Total 500 4000 4500
Or in two steps, using the .sum()
function as you suggested (which might be a bit more readable as well):
import pandas as pd
df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})
#Total sum per column:
df.loc['Total',:]= df.sum(axis=0)
#Total sum per row:
df.loc[:,'Total'] = df.sum(axis=1)
Output:
Graduate Undergraduate Total
Not 440 3760 4200
Straight A's 60 240 300
Total 500 4000 4500