Grouping data by value ranges

后端 未结 3 1091
再見小時候
再見小時候 2021-01-30 18:29

I have a csv file that shows parts on order. The columns include days late, qty and commodity.

I need to group the data by days late and commodity with a sum of the qty.

3条回答
  •  萌比男神i
    2021-01-30 19:14

    You can create a column in your DataFrame based on your Days Late column by using the map or apply functions as follows. Let's first create some sample data.

    df = pandas.DataFrame({ 'ID': 'foo,bar,foo,bar,foo,bar,foo,foo'.split(','),
                            'Days Late': numpy.random.randn(8)*20+30})
    
       Days Late   ID
    0  30.746244  foo
    1  16.234267  bar
    2  14.771567  foo
    3  33.211626  bar
    4   3.497118  foo
    5  52.482879  bar
    6  11.695231  foo
    7  47.350269  foo
    

    Create a helper function to transform the data of the Days Late column and add a column called Code.

    def days_late_xform(dl):
        if dl > 56: return 'Red'
        elif 35 < dl <= 56: return 'Amber'
        elif 14 < dl <= 35: return 'Yellow'
        elif 0 < dl <= 14: return 'White'
        else: return 'None'
    
    df["Code"] = df['Days Late'].map(days_late_xform)
    
       Days Late   ID    Code
    0  30.746244  foo  Yellow
    1  16.234267  bar  Yellow
    2  14.771567  foo  Yellow
    3  33.211626  bar  Yellow
    4   3.497118  foo   White
    5  52.482879  bar   Amber
    6  11.695231  foo   White
    7  47.350269  foo   Amber
    

    Lastly, you can use groupby to aggregate by the ID and Code columns, and get the counts of the groups as follows:

    g = df.groupby(["ID","Code"]).size()
    print g
    
    ID   Code
    bar  Amber     1
         Yellow    2
    foo  Amber     1
         White     2     
         Yellow    2
    
    df2 = g.unstack()
    print df2
    
    Code  Amber  White  Yellow
    ID
    bar       1    NaN       2
    foo       1      2       2
    

提交回复
热议问题