R - Identify a sequence of row elements by groups in a dataframe

后端 未结 3 1262
执念已碎
执念已碎 2021-01-06 00:20

Consider the following sample dataframe:

> df
   id name time
1   1    b   10
2   1    b   12
3   1    a    0
4   2    a    5
5   2    b   11
6   2    a           


        
3条回答
  •  南方客
    南方客 (楼主)
    2021-01-06 00:44

    You can use an ifelse in filter with lag and lead, and then tidyr::spread to reshape to wide:

    library(tidyverse)
    
    df %>% arrange(id, time) %>% group_by(id) %>% 
        filter(ifelse(name == 'b',    # if name is b...
                      lag(name) == 'a',    # is the previous name a?
                      lead(name) == 'b')) %>%    # else if name is not b, is next name b?
        ungroup() %>% mutate(i = rep(seq(n() / 2), each = 2)) %>%    # create indices to spread by
        spread(name, time) %>% select(a, b)    # spread to wide and clean up
    
    ## # A tibble: 3 × 2
    ##       a     b
    ## *  
    ## 1     3    10
    ## 2     5     7
    ## 3     9    11
    

    Based on the comment below, here's a version that uses gregexpr to find the first index of a matched pattern, which while more complicated, scales more easily to longer patterns like "aabb":

    df %>% group_by(pattern = 'aabb', id) %>%    # add pattern as column, group
        arrange(time) %>%
        # collapse each group to a string for name and a list column for time
        summarise(name = paste(name, collapse = ''), time = list(time)) %>% 
        # group and add list-column of start indices for each match
        rowwise() %>% mutate(i = gregexpr(pattern, name)) %>% 
        unnest(i, .drop = FALSE) %>%    # expand, keeping other list columns
        filter(i != -1) %>%    # chop out rows with no match from gregexpr
        rowwise() %>%    # regroup
        # subset with sequence from index through pattern length 
        mutate(time = list(time[i + 0:(nchar(pattern) - 1)]), 
               pattern = strsplit(pattern, '')) %>%    # expand pattern to list column
        rownames_to_column('match') %>%    # add rownames as match index column
        unnest(pattern, time) %>%    # expand matches in parallel
        # paste sequence onto each letter (important for spreading if repeated letters)
        group_by(match) %>% mutate(pattern = paste0(pattern, seq(n()))) %>% 
        spread(pattern, time)    # spread to wide form
    
    ## Source: local data frame [1 x 8]
    ## Groups: match [1]
    ## 
    ##   match    id  name     i    a1    a2    b3    b4
    ## *        
    ## 1     1     1 aabba     1     0     3    10    12
    

    Note that if the pattern doesn't happen to be in alphabetical order, the resulting columns will not be ordered by their indices. Since indices are preserved, though, you can sort with something like select(1:4, parse_number(names(.)[-1:-4]) + 4).

提交回复
热议问题