Identify consecutive sequences based on a given variable

后端 未结 2 1700
走了就别回头了
走了就别回头了 2020-12-10 16:14

I am literally stuck on this. The df1 has the following variables:

  1. serial = Group of people

  2. id1 = th

2条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-10 16:57

    You can make use of lead and lag of dplyr,

    I tried it on my side and here is the result:

    library(dplyr)
    
    df %>% 
        select(serial, contains("day", ignore.case = FALSE)) %>% 
        group_by(serial) %>% 
        tidyr::gather(day, val, -serial) %>% 
        # convert to binary 
        mutate(occur = ifelse(val > 0, 1, 0)) %>% 
        # if detect a seq, add cumulative, else 0
        mutate(cums = ifelse(lead(occur) > 0 & lag(occur) > 0 & occur > 0, 
                             occur + cumsum(occur), 0)) %>% 
        summarise(occurance = max(cums, na.rm = T), duration = sum(val))
    
    # A tibble: 3 x 3
      serial occurance duration
                
    1     10         6       18
    2     12         7       11
    3    123         0       12
    

提交回复
热议问题