I am new to R and this is my first question on stackoverflow.
I am trying
Here is a quick and dirty way which doesn't require much thinking on your part, and captures the first viable option in the subset and leaves an NA if non exists.
the do(f(.)) call evaluates the predefined function f on each subset of dt defined by the group_by statement. I would go translate that simple script into Rcpp for serious use.
library(dplyr)
f <- function(x){
x <- x %>% mutate(founddate = as.Date(NA))
for(i in 1:nrow(x)){
y <- x[i, "date_down"]
x[i, "founddate"] <-(x[-c(1:i),] %>% filter(code == "p", date_up > y) %>% select(date_up))[1, ]
}
return(x)
}
dt %>% group_by(id) %>% do(f(.))
# A tibble: 12 x 5
# Groups: id [6]
id code date_down date_up founddate
1 1 p 2019-01-01 2019-01-02 NA
2 1 f 2019-01-02 2019-01-03 NA
3 2 f 2019-01-02 2019-01-02 NA
4 2 p 2019-01-03 NA NA
5 3 p 2019-01-04 NA NA
6 4 2019-01-05 2019-01-05 NA
7 5 f 2019-01-07 2019-01-08 2019-01-08
8 5 p 2019-01-07 2019-01-08 2019-01-09
9 5 p 2019-01-09 2019-01-09 NA
10 6 f 2019-01-10 2019-01-10 2019-01-11
11 6 p 2019-01-10 2019-01-10 2019-01-11
12 6 p 2019-01-10 2019-01-11 NA
Your Comment about terrible performance is unsurprising. I would personal message this if I knew how, but below is a Rcpp::cppFunction to do the same thing.
Rcpp::cppFunction('DataFrame fC(DataFrame x) {
int i, j;
int n = x.nrows();
CharacterVector code = x["code"];
DateVector date_up = x["date_up"];
DateVector date_down = x["date_down"];
DateVector founddate = rep(NA_REAL, n);
for(i = 0; i < n; i++){
for(j = i + 1; j < n; j++){
if(code(j) == "p"){
if(date_up(j) > date_down(i)){
founddate(i) = date_up(j);
break;
} else{
continue;
}
} else{
continue;
}
}
}
x.push_back(founddate, "founddate");
return x;
}')
dt %>% group_by(id) %>% do(fC(.))