Group Data in R for consecutive rows

后端 未结 3 1857
故里飘歌
故里飘歌 2020-12-10 18:13

If there\'s not a quick 1-3 liner for this in R, I\'ll definitely just use linux sort and a short python program using groupby, so don\'t bend over

3条回答
  •  鱼传尺愫
    2020-12-10 18:57

    First we combine ID and weight. The quick-and-dirty way is using paste:

    df_in$id_weight <- paste(df_in$id, df_in$weight, sep='_')
    df_in
       ID weight start_day end_day id_weight
    1   1    150         1       4     1_150
    2   1    150         4       7     1_150
    3   1    151         7      10     1_151
    4   1    150        10      11     1_150
    5   1    150        11      30     1_150
    6   2    170         5      10     2_170
    7   2    170        10      15     2_170
    8   2    170        15      20     2_170
    9   2    171        20      25     2_171
    10  2    171        25      30     2_171
    

    Safer way is to use interaction or group_indices: Combine values in 4 columns to a single unique value

    We can group consecutively using rle.

    rlel <- rle(df_in$id_weight)$lengths
    df_in$group <- unlist(lapply(1:length(rlel), function(i) rep(i, rlel[i])))
    df_in
       ID weight start_day end_day id_weight group
    1   1    150         1       4     1_150     1
    2   1    150         4       7     1_150     1
    3   1    151         7      10     1_151     2
    4   1    150        10      11     1_150     3
    5   1    150        11      30     1_150     3
    6   2    170         5      10     2_170     4
    7   2    170        10      15     2_170     4
    8   2    170        15      20     2_170     4
    9   2    171        20      25     2_171     5
    10  2    171        25      30     2_171     5
    

    Now with the convenient group number we can summarize by group.

    df_in %>% 
      group_by(group) %>% 
      summarize(id_weight = id_weight[1], 
                start_day = min(start_day), 
                end_day = max(end_day))
    # A tibble: 5 x 4
      group id_weight start_day end_day
                   
    1     1 1_150             1       7
    2     2 1_151             7      10
    3     3 1_150            10      30
    4     4 2_170             5      20
    5     5 2_171            20      30
    

提交回复
热议问题