Given a dataframe with different categorical variables, how do I return a cross-tabulation with percentages instead of frequencies?
df = pd.DataFrame({\'A\' : [\
From Pandas 0.18.1 onwards, there's a normalize option:
In [1]: pd.crosstab(df.A,df.B, normalize='index')
Out[1]:
B A B C
A
one 0.333333 0.333333 0.333333
three 0.333333 0.333333 0.333333
two 0.333333 0.333333 0.333333
Where you can normalise across either all, index (rows), or columns.
More details are available in the documentation.