I have a dataframe with the following structure - Start, End and Height.
Some properties of the dataframe:
A way to do that :
df = pd.DataFrame([[1,3,10], [4,10,7], [11,17,6], [18,26, 12],
[27,30, 15], [31,40,6], [41, 42, 6]], columns=['start','end', 'height'])
Use cut
to make groups :
df['groups']=pd.cut(df.height,[-1,0,5,10,15,1000])
Find break points :
df['categories']=(df.groups!=df.groups.shift()).cumsum()
Then df
is :
"""
start end height groups categories
0 1 3 10 (5, 10] 0
1 4 10 7 (5, 10] 0
2 11 17 6 (5, 10] 0
3 18 26 12 (10, 15] 1
4 27 30 15 (10, 15] 1
5 31 40 6 (5, 10] 2
6 41 42 6 (5, 10] 2
"""
Define interesting data :
f = {'start':['first'],'end':['last'], 'groups':['first']}
And use the groupby.agg
function :
df.groupby('categories').agg(f)
"""
groups end start
first last first
categories
0 (5, 10] 17 1
1 (10, 15] 30 18
2 (5, 10] 42 31
"""