R - Identify a sequence of row elements by groups in a dataframe

后端未结

关注

 3  1270

执念已碎 2021-01-06 00:20

Consider the following sample dataframe:

> df
   id name time
1   1    b   10
2   1    b   12
3   1    a    0
4   2    a    5
5   2    b   11
6   2    a


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   南方客
                                             
                
                
                (楼主)
            
              
              
                2021-01-06 00:44
              

            
            
                        
You can use an ifelse in filter with lag and lead, and then tidyr::spread to reshape to wide:

library(tidyverse)

df %>% arrange(id, time) %>% group_by(id) %>% 
    filter(ifelse(name == 'b',    # if name is b...
                  lag(name) == 'a',    # is the previous name a?
                  lead(name) == 'b')) %>%    # else if name is not b, is next name b?
    ungroup() %>% mutate(i = rep(seq(n() / 2), each = 2)) %>%    # create indices to spread by
    spread(name, time) %>% select(a, b)    # spread to wide and clean up

## # A tibble: 3 × 2
##       a     b
## *  
## 1     3    10
## 2     5     7
## 3     9    11




Based on the comment below, here's a version that uses gregexpr to find the first index of a matched pattern, which while more complicated, scales more easily to longer patterns like "aabb":

df %>% group_by(pattern = 'aabb', id) %>%    # add pattern as column, group
    arrange(time) %>%
    # collapse each group to a string for name and a list column for time
    summarise(name = paste(name, collapse = ''), time = list(time)) %>% 
    # group and add list-column of start indices for each match
    rowwise() %>% mutate(i = gregexpr(pattern, name)) %>% 
    unnest(i, .drop = FALSE) %>%    # expand, keeping other list columns
    filter(i != -1) %>%    # chop out rows with no match from gregexpr
    rowwise() %>%    # regroup
    # subset with sequence from index through pattern length 
    mutate(time = list(time[i + 0:(nchar(pattern) - 1)]), 
           pattern = strsplit(pattern, '')) %>%    # expand pattern to list column
    rownames_to_column('match') %>%    # add rownames as match index column
    unnest(pattern, time) %>%    # expand matches in parallel
    # paste sequence onto each letter (important for spreading if repeated letters)
    group_by(match) %>% mutate(pattern = paste0(pattern, seq(n()))) %>% 
    spread(pattern, time)    # spread to wide form

## Source: local data frame [1 x 8]
## Groups: match [1]
## 
##   match    id  name     i    a1    a2    b3    b4
## *        
## 1     1     1 aabba     1     0     3    10    12


Note that if the pattern doesn't happen to be in alphabetical order, the resulting columns will not be ordered by their indices. Since indices are preserved, though, you can sort with something like select(1:4, parse_number(names(.)[-1:-4]) + 4).
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复