Pandas percentage of total with groupby

前端 未结 14 2504
没有蜡笔的小新
没有蜡笔的小新 2020-11-22 06:41

This is obviously simple, but as a numpy newbe I\'m getting stuck.

I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office

14条回答
  •  天涯浪人
    2020-11-22 07:20

    For conciseness I'd use the SeriesGroupBy:

    In [11]: c = df.groupby(['state', 'office_id'])['sales'].sum().rename("count")
    
    In [12]: c
    Out[12]:
    state  office_id
    AZ     2            925105
           4            592852
           6            362198
    CA     1            819164
           3            743055
           5            292885
    CO     1            525994
           3            338378
           5            490335
    WA     2            623380
           4            441560
           6            451428
    Name: count, dtype: int64
    
    In [13]: c / c.groupby(level=0).sum()
    Out[13]:
    state  office_id
    AZ     2            0.492037
           4            0.315321
           6            0.192643
    CA     1            0.441573
           3            0.400546
           5            0.157881
    CO     1            0.388271
           3            0.249779
           5            0.361949
    WA     2            0.411101
           4            0.291196
           6            0.297703
    Name: count, dtype: float64
    

    For multiple groups you have to use transform (using Radical's df):

    In [21]: c =  df.groupby(["Group 1","Group 2","Final Group"])["Numbers I want as percents"].sum().rename("count")
    
    In [22]: c / c.groupby(level=[0, 1]).transform("sum")
    Out[22]:
    Group 1  Group 2  Final Group
    AAHQ     BOSC     OWON           0.331006
                      TLAM           0.668994
             MQVF     BWSI           0.288961
                      FXZM           0.711039
             ODWV     NFCH           0.262395
    ...
    Name: count, dtype: float64
    

    This seems to be slightly more performant than the other answers (just less than twice the speed of Radical's answer, for me ~0.08s).

提交回复
热议问题