Pandas: grouping a column on a value and creating new column headings

前端 未结 1 1679
没有蜡笔的小新
没有蜡笔的小新 2020-12-20 03:37

I am trying to re-arrange a pandas dataframe.

Presently I have:

id  mins  param
1   10    0.15
1   11    0.16 
1   12    0.17
2   10    0.20
2   11           


        
相关标签:
1条回答
  • 2020-12-20 04:18

    Use pivot or unstack with set_index and last add_prefix:

    df = df.pivot(index='id', columns='mins', values='param').add_prefix('param_')
    print (df)
    mins  param_10  param_11  param_12
    id                                
    1         0.15      0.16      0.17
    2         0.20      0.21      0.22
    

    df = df.set_index(['id','mins'])['param'].unstack().add_prefix('param_')
    print (df)
    mins  param_10  param_11  param_12
    id                                
    1         0.15      0.16      0.17
    2         0.20      0.21      0.22
    

    If error:

    ValueError: Index contains duplicate entries, cannot reshape

    it means duplicates, need aggregation. Use pivot_table or groupby with function like mean ( sum, median, ...) and last unstack:

    print (df)
       id  mins  param
    0   1    10   0.15 <- id mins dupe
    1   1    10   0.50 <- id mins dupe
    2   1    11   0.16
    3   1    12   0.17
    4   2    10   0.20
    5   2    11   0.21
    6   2    12   0.22
    
    df = df.pivot_table(index='id', columns='mins', values='param', aggfunc='mean')
           .add_prefix('param_')
    print (df)
    mins  param_10  param_11  param_12
    id                                
    1        0.325      0.16      0.17 <- (0.15+0.5)/2 = 0.325
    2        0.200      0.21      0.22
    

    df = df.groupby(['id','mins'])['param'].mean().unstack().add_prefix('param_')
    print (df)
    mins  param_10  param_11  param_12
    id                                
    1        0.325      0.16      0.17 <- (0.15+0.5)/2 = 0.325
    2        0.200      0.21      0.22
    

    For cleaning set columns name to None by rename_axis and use reset_index for column id:

    df = df.rename_axis(None, axis=1).reset_index()
    print (df)
       id  param_10  param_11  param_12
    0   1     0.325      0.16      0.17
    1   2     0.200      0.21      0.22
    

    EDIT:

    Solution with multiple columns:

    df['param1'] = df['param'] / 4
    print (df)
       id  mins  param  param1
    0   1    10   0.15  0.0375
    1   1    10   0.50  0.1250
    2   1    11   0.16  0.0400
    3   1    12   0.17  0.0425
    4   2    10   0.20  0.0500
    5   2    11   0.21  0.0525
    6   2    12   0.22  0.0550
    
    df = df.pivot_table(index='id', columns='mins', values=['param', 'param1'], aggfunc='mean')
    df.columns = ['_'.join((x[0], str(x[1]))) for x in df.columns]
    print (df)
        param_10  param_11  param_12  param1_10  param1_11  param1_12
    id                                                               
    1      0.325      0.16      0.17    0.08125     0.0400     0.0425
    2      0.200      0.21      0.22    0.05000     0.0525     0.0550
    
    0 讨论(0)
提交回复
热议问题