Pandas percentage of total with groupby

前端 未结 14 2476
没有蜡笔的小新
没有蜡笔的小新 2020-11-22 06:41

This is obviously simple, but as a numpy newbe I\'m getting stuck.

I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office

14条回答
  •  没有蜡笔的小新
    2020-11-22 07:19

    The most elegant way to find percentages across columns or index is to use pd.crosstab.

    Sample Data

    df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': list(range(1, 7)) * 2,
                   'sales': [np.random.randint(100000, 999999) for _ in range(12)]})
    

    The output dataframe is like this

    print(df)
    
            state   office_id   sales
        0   CA  1   764505
        1   WA  2   313980
        2   CO  3   558645
        3   AZ  4   883433
        4   CA  5   301244
        5   WA  6   752009
        6   CO  1   457208
        7   AZ  2   259657
        8   CA  3   584471
        9   WA  4   122358
        10  CO  5   721845
        11  AZ  6   136928
    

    Just specify the index, columns and the values to aggregate. The normalize keyword will calculate % across index or columns depending upon the context.

    result = pd.crosstab(index=df['state'], 
                         columns=df['office_id'], 
                         values=df['sales'], 
                         aggfunc='sum', 
                         normalize='index').applymap('{:.2f}%'.format)
    
    
    
    
    print(result)
    office_id   1   2   3   4   5   6
    state                       
    AZ  0.00%   0.20%   0.00%   0.69%   0.00%   0.11%
    CA  0.46%   0.00%   0.35%   0.00%   0.18%   0.00%
    CO  0.26%   0.00%   0.32%   0.00%   0.42%   0.00%
    WA  0.00%   0.26%   0.00%   0.10%   0.00%   0.63%
    

提交回复
热议问题