R How to group_by, split or subset by row values

拜拜、爱过 提交于 2021-02-17 07:08:07

问题


This is continued from last question R, how to group by row value? Split?

The change in input Dataframe is

id = str_c("x",1:22)
val = c(rep("NO1", 2), "START", rep("yes1", 2), "STOP", "NO",
         "START","NO1", "START", rep("yes2", 3), "STOP", "NO1",
        "START", rep("NO3",3), "STOP", "NO1", "STOP")
data = data.frame(id,val)

Expected output is dataframe with val column as follows-

val = c("START", rep("yes1", 2), "STOP", 
        "START","NO1", "START", rep("yes2", 3), "STOP",
        "START", rep("NO3",3), "STOP", "NO1", "STOP")

回答1:


Simply speaking, if we remove all the other entries that are neither START nor STOP, then, a START is a valid start point if it is the first START or preceded by a STOP; similarly, a STOP is a valid endpoint if it is the last STOP or succeeded by a START. Consider this function:

valid_anchors <- function(x) {
  are_anchors <- x %in% c("START", "STOP")
  id <- seq_along(x)[are_anchors]
  x <- x[are_anchors]
  start_pos <- which(x == "START" & c("", head(x, -1L)) %in% c("", "STOP"))
  stop_pos <- which(x == "STOP" & c(tail(x, -1L), "") %in% c("", "START"))
  list(id[start_pos], id[stop_pos])
}

Then just apply the same function you got in your last post

ind <- valid_anchors(data$val)

data[sort(unique(unlist(mapply(`:`, ind[[1]], ind[[2]])))), ]

Output

    id   val
3   x3 START
4   x4  yes1
5   x5  yes1
6   x6  STOP
8   x8 START
9   x9   NO1
10 x10 START
11 x11  yes2
12 x12  yes2
13 x13  yes2
14 x14  STOP
16 x16 START
17 x17   NO3
18 x18   NO3
19 x19   NO3
20 x20  STOP
21 x21   NO1
22 x22  STOP


来源:https://stackoverflow.com/questions/64619727/r-how-to-group-by-split-or-subset-by-row-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!