问题
I am trying to solve something that for me is a problem since a few days.
Here an example of my data.frame, which I hope will work with my real one.
df <- read.table(text = 'ID Day Count
33012 9526 4
35004 9526 4
37006 9526 4
37008 9526 4
21009 1913 3
24005 1913 3
25009 1913 3
22317 2286 2
37612 2286 2
25009 14329 1
48007 9525 0
88662 9524 0
1845 9524 0
8872 2285 0
49002 1912 0
1664 1911 0', header = TRUE)
I need to add a new column (new_col
) to my data.frame which contains values from 1 to 4. These new_col
values have to include, each one, day (x) day (x -1) and day (x -2), where x = 9526, 1913, 2286, 14329 (column Day
).
My output should be the following:
ID Day Count new_col
33012 9526 4 1
35004 9526 4 1
37006 9526 4 1
37008 9526 4 1
21009 1913 3 2
24005 1913 3 2
25009 1913 3 2
22317 2286 2 3
37612 2286 2 3
25009 14329 1 4
48007 9525 0 1
88662 9524 0 1
1845 9524 0 1
8872 2285 0 3
49002 1912 0 2
1664 1911 0 2
The data.frame ordered by new_col
will be then:
ID Day Count new_col
33012 9526 4 1
35004 9526 4 1
37006 9526 4 1
37008 9526 4 1
48007 9525 0 1
88662 9524 0 1
1845 9524 0 1
21009 1913 3 2
24005 1913 3 2
25009 1913 3 2
49002 1912 0 2
1664 1911 0 2
22317 2286 2 3
37612 2286 2 3
8872 2285 0 3
25009 14329 1 4
My real data.frame is more complex than the example (i.e. more columns and more values in the Count
column, therefore be patient if I will update the question.
Any suggestion will be really helpful.
回答1:
I'm not sure I totally understand your question, but it seems like you could use cut()
to achieve this, as follows:
x <- c(1913, 2286, 9526, 14329)
df$new_col <- cut(df$Day, c(-Inf, x, Inf))
df$new_col <- as.numeric(factor(df$new_col, levels=unique(df$new_col)))
回答2:
Here is a non scalable but easy to understand solution using dplyr
package, we can use case_when
to recode the Day based on the condition:
library(dplyr)
df %>% mutate(new_col = case_when(abs(df$Day - 9526) <= 2 ~ 1,
abs(df$Day - 1913) <= 2 ~ 2,
abs(df$Day - 2286)<= 2 ~ 3,
abs(df$Day - 14329) <= 2 ~ 4)) %>%
arrange(new_col)
# ID Day Count new_col
# 1 33012 9526 4 1
# 2 35004 9526 4 1
# 3 37006 9526 4 1
# 4 37008 9526 4 1
# 5 48007 9525 0 1
# 6 88662 9524 0 1
# 7 1845 9524 0 1
# 8 21009 1913 3 2
# 9 24005 1913 3 2
# 10 25009 1913 3 2
# 11 49002 1912 0 2
# 12 1664 1911 0 2
# 13 22317 2286 2 3
# 14 37612 2286 2 3
# 15 8872 2285 0 3
# 16 25009 14329 1 4
A more scalable approach would be to use foverlaps
from data.table
package, where we prepare a look up table and then join back with the original table and use within
type join to make sure the days are in the range specified in the look up table, for better explanation about foverlaps
library(data.table)
# prepare the look up table
x <- c(9526, 1913, 2286, 14329)
dt1 <- data.table(start = x - 2, end = x, new_col = 1:4)
setkey(dt1)
dt1
# start end new_col
# 1: 1911 1913 2
# 2: 2284 2286 3
# 3: 9524 9526 1
# 4: 14327 14329 4
# prepare the original table
dt = copy(setDT(df))
dt[, Day2 := Day]
# do a foverlaps
foverlaps(dt, dt1, by.x = c("Day", "Day2"), by.y = c("start", "end"), type = "within", mult = "all", nomatch = 0L)[, .(ID, Day, Count, new_col)][order(new_col)]
# ID Day Count new_col
# 1 33012 9526 4 1
# 2 35004 9526 4 1
# 3 37006 9526 4 1
# 4 37008 9526 4 1
# 5 48007 9525 0 1
# 6 88662 9524 0 1
# 7 1845 9524 0 1
# 8 21009 1913 3 2
# 9 24005 1913 3 2
# 10 25009 1913 3 2
# 11 49002 1912 0 2
# 12 1664 1911 0 2
# 13 22317 2286 2 3
# 14 37612 2286 2 3
# 15 8872 2285 0 3
# 16 25009 14329 1 4
来源:https://stackoverflow.com/questions/38660467/add-column-to-dataframe-depending-on-specific-row-values