Pandas groupby and aggregation output should include all the original columns (including the ones not aggregated on)

前端 未结 2 1957
孤城傲影
孤城傲影 2020-12-03 11:21

I have the following data frame and want to:

  • Group records by month
  • Sum QTY_SOLDand NET_AMT of each unique
相关标签:
2条回答
  • 2020-12-03 11:47

    agg with a dict of functions

    Create a dict of functions and pass it to agg. You'll also need as_index=False to prevent the group columns from becoming the index in your output.

    f = {'NET_AMT': 'sum', 'QTY_SOLD': 'sum', 'UPC_DSC': 'first'}
    df.groupby(['month', 'UPC_ID'], as_index=False).agg(f)
    
         month  UPC_ID UPC_DSC  NET_AMT  QTY_SOLD
    0  2017.02     111   desc1       10         2
    1  2017.02     222   desc2       15         3
    2  2017.02     333   desc3        4         1
    3  2017.03     111   desc1       25         5
    

    Blanket sum

    Just call sum without any column names. This handles the numeric columns. For UPC_DSC, you'll need to handle it separately.

    g = df.groupby(['month', 'UPC_ID'])
    i = g.sum()
    j = g[['UPC_DSC']].first()
    
    pd.concat([i, j], 1).reset_index()
    
         month  UPC_ID  QTY_SOLD  NET_AMT UPC_DSC
    0  2017.02     111         2       10   desc1
    1  2017.02     222         3       15   desc2
    2  2017.02     333         1        4   desc3
    3  2017.03     111         5       25   desc1
    
    0 讨论(0)
  • 2020-12-03 12:02

    I am thinking about this long time, thanks for your question push me to make it .By using agg and if...else

    df.groupby(['month', 'UPC_ID'],as_index=False).agg(lambda x : x.sum() if x.dtype=='int64' else x.head(1))
    Out[1221]: 
       month  UPC_ID UPC_DSC     D_DATE  QTY_SOLD  NET_AMT
    0      2     111   desc1 2017-02-26         2       10
    1      2     222   desc2 2017-02-26         3       15
    2      2     333   desc3 2017-02-26         1        4
    3      3     111   desc1 2017-03-01         5       25
    
    0 讨论(0)
提交回复
热议问题