问题
I'm trying to work out how to use the groupby
function in pandas to work out the proportions of values per year with a given Yes/No criteria.
For example, I have a dataframe called names
:
Name Number Year Sex Criteria
0 name1 789 1998 Male N
1 name1 688 1999 Male N
2 name1 639 2000 Male N
3 name2 551 1998 Male Y
4 name2 499 1999 Male Y
I can use
namesgrouped = names.groupby(["Sex", "Year", "Criteria"]).sum()
to get:
Number
Sex Year Criteria
Male 1998 N 14507
Y 2308
1999 N 14119
Y 2331
and so on. I would like the 'Number Criteria' column to show the % of the total for each gender and year - so instead of N = 14507 and Y = 2308 for 1998 above I'd have N = 86.27% and Y = 13.73%.
Can anyone advise how to do this?
回答1:
This question is a direct extension of the suggested duplicate. Borrowing from the accepted answer, this will work:
In [46]: namesgrouped.groupby(level=[0, 1]).apply(lambda g: g / g.sum())
Out[46]:
Number
Sex Year Criteria
Male 1998 N 0.588806
Y 0.411194
1999 N 0.579612
Y 0.420388
2000 N 1.000000
Edit: a transform operation might be faster than apply:
namesgrouped / namesgrouped.groupby(level=[0, 1]).transform('sum')
来源:https://stackoverflow.com/questions/36987829/how-to-use-groupby-in-pandas-to-calculate-a-percentage-proportion-total-based