This is obviously simple, but as a numpy newbe I\'m getting stuck.
I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office
For conciseness I'd use the SeriesGroupBy:
In [11]: c = df.groupby(['state', 'office_id'])['sales'].sum().rename("count")
In [12]: c
Out[12]:
state office_id
AZ 2 925105
4 592852
6 362198
CA 1 819164
3 743055
5 292885
CO 1 525994
3 338378
5 490335
WA 2 623380
4 441560
6 451428
Name: count, dtype: int64
In [13]: c / c.groupby(level=0).sum()
Out[13]:
state office_id
AZ 2 0.492037
4 0.315321
6 0.192643
CA 1 0.441573
3 0.400546
5 0.157881
CO 1 0.388271
3 0.249779
5 0.361949
WA 2 0.411101
4 0.291196
6 0.297703
Name: count, dtype: float64
For multiple groups you have to use transform (using Radical's df):
In [21]: c = df.groupby(["Group 1","Group 2","Final Group"])["Numbers I want as percents"].sum().rename("count")
In [22]: c / c.groupby(level=[0, 1]).transform("sum")
Out[22]:
Group 1 Group 2 Final Group
AAHQ BOSC OWON 0.331006
TLAM 0.668994
MQVF BWSI 0.288961
FXZM 0.711039
ODWV NFCH 0.262395
...
Name: count, dtype: float64
This seems to be slightly more performant than the other answers (just less than twice the speed of Radical's answer, for me ~0.08s).