Grouping of R dataframe by connected values

前端 未结 4 686
臣服心动
臣服心动 2021-01-13 02:07

I didn\'t find a solution for this common grouping problem in R:

This is my original dataset

ID  State
1   A
2   A
3   B
4   B
5   B
6   A
7   A
8            


        
4条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-13 02:37

    An idea with data.table:

    require(data.table)
    
    dt <- fread("ID  State
    1   A
                2   A
                3   B
                4   B
                5   B
                6   A
                7   A
                8   A
                9   C
                10  C")
    
    dt[,rle := rleid(State)]
    dt2<-dt[,list(min=min(ID),max=max(ID)),by=c("rle","State")]
    

    which gives:

       rle State min max
    1:   1     A   1   2
    2:   2     B   3   5
    3:   3     A   6   8
    4:   4     C   9  10
    

    The idea is to identify sequences with rleid and then get the min and max of IDby the tuple rle and State.

    you can remove the rle column with

    dt2[,rle:=NULL]
    

    Chained:

     dt2<-dt[,list(min=min(ID),max=max(ID)),by=c("rle","State")][,rle:=NULL]
    

    You can shorten the above code even more by using rleid inside by directly:

    dt2 <- dt[, .(min=min(ID),max=max(ID)), by=.(State, rleid(State))][, rleid:=NULL]
    

提交回复
热议问题