Grouping data by value ranges

后端未结

关注

 3  1091

再見小時候 2021-01-30 18:29

I have a csv file that shows parts on order. The columns include days late, qty and commodity.

I need to group the data by days late and commodity with a sum of the qty.

3条回答

萌比男神i (楼主)

2021-01-30 19:14

You can create a column in your DataFrame based on your Days Late column by using the map or apply functions as follows. Let's first create some sample data.

df = pandas.DataFrame({ 'ID': 'foo,bar,foo,bar,foo,bar,foo,foo'.split(','),
                        'Days Late': numpy.random.randn(8)*20+30})

   Days Late   ID
0  30.746244  foo
1  16.234267  bar
2  14.771567  foo
3  33.211626  bar
4   3.497118  foo
5  52.482879  bar
6  11.695231  foo
7  47.350269  foo

Create a helper function to transform the data of the Days Late column and add a column called Code.

def days_late_xform(dl):
    if dl > 56: return 'Red'
    elif 35 < dl <= 56: return 'Amber'
    elif 14 < dl <= 35: return 'Yellow'
    elif 0 < dl <= 14: return 'White'
    else: return 'None'

df["Code"] = df['Days Late'].map(days_late_xform)

   Days Late   ID    Code
0  30.746244  foo  Yellow
1  16.234267  bar  Yellow
2  14.771567  foo  Yellow
3  33.211626  bar  Yellow
4   3.497118  foo   White
5  52.482879  bar   Amber
6  11.695231  foo   White
7  47.350269  foo   Amber

Lastly, you can use groupby to aggregate by the ID and Code columns, and get the counts of the groups as follows:

g = df.groupby(["ID","Code"]).size()
print g

ID   Code
bar  Amber     1
     Yellow    2
foo  Amber     1
     White     2     
     Yellow    2

df2 = g.unstack()
print df2

Code  Amber  White  Yellow
ID
bar       1    NaN       2
foo       1      2       2

0 讨论(0)

查看其它3个回答