This is obviously simple, but as a numpy newbe I\'m getting stuck.
I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office
I realize there are already good answers here.
I nevertheless would like to contribute my own, because I feel for an elementary, simple question like this, there should be a short solution that is understandable at a glance.
It should also work in a way that I can add the percentages as a new column, leaving the rest of the dataframe untouched. Last but not least, it should generalize in an obvious way to the case in which there is more than one grouping level (e.g., state and country instead of only state).
The following snippet fulfills these criteria:
df['sales_ratio'] = df.groupby(['state'])['sales'].transform(lambda x: x/x.sum())
Note that if you're still using Python 2, you'll have to replace the x in the denominator of the lambda term by float(x).