Set column value using the first next row in the same group that meets a condition

前端未结

关注

 3  909

予麋鹿 2021-02-08 10:26

I am new to R and this is my first question on stackoverflow.

I am trying

to assign by reference to a new column
for each row
using the v

3条回答

無奈伤痛 (楼主)

2021-02-08 10:50

Here is a quick and dirty way which doesn't require much thinking on your part, and captures the first viable option in the subset and leaves an NA if non exists.

the do(f(.)) call evaluates the predefined function f on each subset of dt defined by the group_by statement. I would go translate that simple script into Rcpp for serious use.

library(dplyr)
f <- function(x){
  x <- x %>% mutate(founddate = as.Date(NA))

  for(i in 1:nrow(x)){
    y <- x[i, "date_down"]
    x[i, "founddate"] <-(x[-c(1:i),] %>% filter(code == "p", date_up > y) %>% select(date_up))[1, ]
  }

  return(x)
}

dt %>% group_by(id) %>% do(f(.))

# A tibble: 12 x 5
# Groups:   id [6]
      id code  date_down  date_up    founddate 
                   
 1     1 p     2019-01-01 2019-01-02 NA        
 2     1 f     2019-01-02 2019-01-03 NA        
 3     2 f     2019-01-02 2019-01-02 NA        
 4     2 p     2019-01-03 NA         NA        
 5     3 p     2019-01-04 NA         NA        
 6     4   2019-01-05 2019-01-05 NA        
 7     5 f     2019-01-07 2019-01-08 2019-01-08
 8     5 p     2019-01-07 2019-01-08 2019-01-09
 9     5 p     2019-01-09 2019-01-09 NA        
10     6 f     2019-01-10 2019-01-10 2019-01-11
11     6 p     2019-01-10 2019-01-10 2019-01-11
12     6 p     2019-01-10 2019-01-11 NA

Your Comment about terrible performance is unsurprising. I would personal message this if I knew how, but below is a Rcpp::cppFunction to do the same thing.

Rcpp::cppFunction('DataFrame fC(DataFrame x) {
                    int i, j;
                    int n = x.nrows();
                    CharacterVector code = x["code"];
                    DateVector date_up = x["date_up"];
                    DateVector date_down = x["date_down"];
                    DateVector founddate = rep(NA_REAL, n);

                    for(i = 0; i < n; i++){
                      for(j = i + 1; j < n; j++){
                        if(code(j) == "p"){
                          if(date_up(j) > date_down(i)){
                            founddate(i) = date_up(j);
                            break;
                          } else{
                            continue;
                          }
                        } else{
                          continue;
                        }
                      }
                    }
                    x.push_back(founddate, "founddate");
                    return x;
                    }')

dt %>% group_by(id) %>% do(fC(.))

0 讨论(0)

查看其它3个回答