To make this question more generalized, I believe it could also be rephrased as: Creating a rolling temporally sensitive factor variable. Though an uncommon
Assuming that I understood it right, here's a data.table
way using foverlaps()
function.
Create dt
and set key as shown below:
dt <- data.table(player_id = p, games = g, date = d, end_date = d)
setkey(dt, player_id, date, end_date)
hybrid_index <- function(dt, roll_days) {
ivals = copy(dt)[, date := date-roll_days]
olaps = foverlaps(ivals, dt, type="any", which=TRUE)
olaps[, val := dt$games[xid] != dt$games[yid]]
olaps[, any(val), by=xid][(V1), xid]
}
We create a dummy data.table ivals
(for intervals), and for each row, we specify the start and the end dates. Note that by specifying end_date identical as dt$end_date
, we'll definitely have one match (and this is deliberate) - this'll give you the non-NA version you ask for.
[With some minor changes here, you can get the NA
version, but I'll leave that to you (assuming this answer is right).]
With that we simply find which ranges from ivals
overlaps with dt
, for each player_id
. We get the matching indices. From there it's straightforward. If a player's game is non-homogeneous, then we return the corresponding index of dt
from hybrid_index
. And we replace those indices with "hybrid".
# roll days = 1L
dt[, type := games][hybrid_index(dt, 1L), type := "hybrid"]
# player_id games date end_date type
# 1: 1 A 2014-10-01 2014-10-01 A
# 2: 1 B 2014-10-02 2014-10-02 hybrid
# 3: 1 B 2014-10-03 2014-10-03 B
# 4: 2 A 2014-10-04 2014-10-04 A
# 5: 2 B 2014-10-05 2014-10-05 hybrid
# 6: 2 A 2014-10-06 2014-10-06 hybrid
# 7: 6 A 2014-10-07 2014-10-07 A
# 8: 6 B 2014-10-08 2014-10-08 hybrid
# 9: 6 B 2014-10-09 2014-10-09 B
# roll days = 2L
dt[, type := games][hybrid_index(dt, 2L), type := "hybrid"]
# player_id games date end_date type
# 1: 1 A 2014-10-01 2014-10-01 A
# 2: 1 B 2014-10-02 2014-10-02 hybrid
# 3: 1 B 2014-10-03 2014-10-03 hybrid
# 4: 2 A 2014-10-04 2014-10-04 A
# 5: 2 B 2014-10-05 2014-10-05 hybrid
# 6: 2 A 2014-10-06 2014-10-06 hybrid
# 7: 6 A 2014-10-07 2014-10-07 A
# 8: 6 B 2014-10-08 2014-10-08 hybrid
# 9: 6 B 2014-10-09 2014-10-09 hybrid
To illustrate the idea clearly, I've created a function and copied dt
inside the function. But you can avoid that and add the dates in ivals
directly to dt
and make use of by.x
and by.y
arguments in foverlaps()
. Please look at ?foverlaps
.