Given a dataframe with different categorical variables, how do I return a cross-tabulation with percentages instead of frequencies?
df = pd.DataFrame({\'A\' : [\
Another option is to use div rather than apply:
In [11]: res = pd.crosstab(df.A, df.B)
Divide by the sum over the index:
In [12]: res.sum(axis=1)
Out[12]:
A
one 12
three 6
two 6
dtype: int64
Similar to above, you need to do something about integer division (I use astype('float')):
In [13]: res.astype('float').div(res.sum(axis=1), axis=0)
Out[13]:
B A B C
A
one 0.333333 0.333333 0.333333
three 0.333333 0.333333 0.333333
two 0.333333 0.333333 0.333333