Pandas percentage of total with groupby

前端 未结 14 2470
没有蜡笔的小新
没有蜡笔的小新 2020-11-22 06:41

This is obviously simple, but as a numpy newbe I\'m getting stuck.

I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office

14条回答
  •  迷失自我
    2020-11-22 07:21

    (This solution is inspired from this article https://pbpython.com/pandas_transform.html)

    I find the following solution to be the simplest(and probably the fastest) using transformation:

    Transformation: While aggregation must return a reduced version of the data, transformation can return some transformed version of the full data to recombine. For such a transformation, the output is the same shape as the input.

    So using transformation, the solution is 1-liner:

    df['%'] = 100 * df['sales'] / df.groupby('state')['sales'].transform('sum')
    

    And if you print:

    print(df.sort_values(['state', 'office_id']).reset_index(drop=True))
    
       state  office_id   sales          %
    0     AZ          2  195197   9.844309
    1     AZ          4  877890  44.274352
    2     AZ          6  909754  45.881339
    3     CA          1  614752  50.415708
    4     CA          3  395340  32.421767
    5     CA          5  209274  17.162525
    6     CO          1  549430  42.659629
    7     CO          3  457514  35.522956
    8     CO          5  280995  21.817415
    9     WA          2  828238  35.696929
    10    WA          4  719366  31.004563
    11    WA          6  772590  33.298509
    

提交回复
热议问题