Pandas - group by consecutive ranges

后端 未结 2 1956
太阳男子
太阳男子 2021-01-04 20:54

I have a dataframe with the following structure - Start, End and Height.

Some properties of the dataframe:

  • A row in the dataframe always starts from wh
2条回答
  •  無奈伤痛
    2021-01-04 21:15

    A way to do that :

    df = pd.DataFrame([[1,3,10], [4,10,7], [11,17,6], [18,26, 12],
    [27,30, 15], [31,40,6], [41, 42, 6]], columns=['start','end', 'height'])
    

    Use cut to make groups :

    df['groups']=pd.cut(df.height,[-1,0,5,10,15,1000])
    

    Find break points :

    df['categories']=(df.groups!=df.groups.shift()).cumsum()
    

    Then df is :

    """
       start  end  height    groups  categories
    0      1    3      10   (5, 10]           0
    1      4   10       7   (5, 10]           0
    2     11   17       6   (5, 10]           0
    3     18   26      12  (10, 15]           1
    4     27   30      15  (10, 15]           1
    5     31   40       6   (5, 10]           2
    6     41   42       6   (5, 10]           2
    """
    

    Define interesting data :

    f = {'start':['first'],'end':['last'], 'groups':['first']}
    

    And use the groupby.agg function :

    df.groupby('categories').agg(f)
    """
                  groups  end start
                   first last first
    categories                     
    0            (5, 10]   17     1
    1           (10, 15]   30    18
    2            (5, 10]   42    31
    """
    

提交回复
热议问题