Pandas - group by consecutive ranges

后端未结

关注

 2  1961

太阳男子 2021-01-04 20:54

I have a dataframe with the following structure - Start, End and Height.

Some properties of the dataframe:

A row in the dataframe always starts from wh

2条回答

無奈伤痛 (楼主)

2021-01-04 21:15

A way to do that :

df = pd.DataFrame([[1,3,10], [4,10,7], [11,17,6], [18,26, 12],
[27,30, 15], [31,40,6], [41, 42, 6]], columns=['start','end', 'height'])

Use cut to make groups :

df['groups']=pd.cut(df.height,[-1,0,5,10,15,1000])

Find break points :

df['categories']=(df.groups!=df.groups.shift()).cumsum()

Then df is :

"""
   start  end  height    groups  categories
0      1    3      10   (5, 10]           0
1      4   10       7   (5, 10]           0
2     11   17       6   (5, 10]           0
3     18   26      12  (10, 15]           1
4     27   30      15  (10, 15]           1
5     31   40       6   (5, 10]           2
6     41   42       6   (5, 10]           2
"""

Define interesting data :

f = {'start':['first'],'end':['last'], 'groups':['first']}

And use the groupby.agg function :

df.groupby('categories').agg(f)
"""
              groups  end start
               first last first
categories                     
0            (5, 10]   17     1
1           (10, 15]   30    18
2            (5, 10]   42    31
"""

0 讨论(0)

查看其它2个回答