Simple cross-tabulation in pandas

后端 未结 2 475
忘掉有多难
忘掉有多难 2021-02-04 00:01

I stumbled across pandas and it looks ideal for simple calculations that I\'d like to do. I have a SAS background and was thinking it\'d replace proc freq -- it looks like it\'l

2条回答
  •  广开言路
    2021-02-04 01:05

    v0.21 answer

    Use pivot_table with the index parameter:

    df.pivot_table(index='category', aggfunc=[len, sum])
    
               len   sum
             value value
    category            
    AB           2   300
    AC           1   150
    AD           1   500
    

    <= v0.12

    It is possible to do this using pivot_table for those interested:

    In [8]: df
    Out[8]: 
      category  value
    0       AB    100
    1       AB    200
    2       AC    150
    3       AD    500
    
    In [9]: df.pivot_table(rows='category', aggfunc=[len, np.sum])
    Out[9]: 
                len    sum
              value  value
    category              
    AB            2    300
    AC            1    150
    AD            1    500
    

    Note that the result's columns are hierarchically indexed. If you had multiple data columns, you would get a result like this:

    In [12]: df
    Out[12]: 
      category  value  value2
    0       AB    100       5
    1       AB    200       5
    2       AC    150       5
    3       AD    500       5
    
    In [13]: df.pivot_table(rows='category', aggfunc=[len, np.sum])
    Out[13]: 
                len            sum        
              value  value2  value  value2
    category                              
    AB            2       2    300      10
    AC            1       1    150       5
    AD            1       1    500       5
    

    The main reason to use __builtin__.sum vs. np.sum is that you get NA-handling from the latter. Probably could intercept the Python built-in, will make a note about that now.

提交回复
热议问题