Can the mutate be used when the mutation is conditional (depending on the values of certain column values)?
This example helps showing what I mean.
s
dplyr now has a function case_when
that offers a vectorised if. The syntax is a little strange compared to mosaic:::derivedFactor
as you cannot access variables in the standard dplyr way, and need to declare the mode of NA, but it is considerably faster than mosaic:::derivedFactor
.
df %>%
mutate(g = case_when(a %in% c(2,5,7) | (a==1 & b==4) ~ 2L,
a %in% c(0,1,3,4) | c == 4 ~ 3L,
TRUE~as.integer(NA)))
EDIT: If you're using dplyr::case_when()
from before version 0.7.0 of the package, then you need to precede variable names with '.$
' (e.g. write .$a == 1
inside case_when
).
Benchmark: For the benchmark (reusing functions from Arun 's post) and reducing sample size:
require(data.table)
require(mosaic)
require(dplyr)
require(microbenchmark)
set.seed(42) # To recreate the dataframe
DT <- setDT(lapply(1:6, function(x) sample(7, 10000, TRUE)))
setnames(DT, letters[1:6])
DF <- as.data.frame(DT)
DPLYR_case_when <- function(DF) {
DF %>%
mutate(g = case_when(a %in% c(2,5,7) | (a==1 & b==4) ~ 2L,
a %in% c(0,1,3,4) | c==4 ~ 3L,
TRUE~as.integer(NA)))
}
DT_fun <- function(DT) {
DT[(a %in% c(0,1,3,4) | c == 4), g := 3L]
DT[a %in% c(2,5,7) | (a==1 & b==4), g := 2L]
}
DPLYR_fun <- function(DF) {
mutate(DF, g = ifelse(a %in% c(2,5,7) | (a==1 & b==4), 2L,
ifelse(a %in% c(0,1,3,4) | c==4, 3L, NA_integer_)))
}
mosa_fun <- function(DF) {
mutate(DF, g = derivedFactor(
"2" = (a == 2 | a == 5 | a == 7 | (a == 1 & b == 4)),
"3" = (a == 0 | a == 1 | a == 4 | a == 3 | c == 4),
.method = "first",
.default = NA
))
}
perf_results <- microbenchmark(
dt_fun <- DT_fun(copy(DT)),
dplyr_ifelse <- DPLYR_fun(copy(DF)),
dplyr_case_when <- DPLYR_case_when(copy(DF)),
mosa <- mosa_fun(copy(DF)),
times = 100L
)
This gives:
print(perf_results)
Unit: milliseconds
expr min lq mean median uq max neval
dt_fun 1.391402 1.560751 1.658337 1.651201 1.716851 2.383801 100
dplyr_ifelse 1.172601 1.230351 1.331538 1.294851 1.390351 1.995701 100
dplyr_case_when 1.648201 1.768002 1.860968 1.844101 1.958801 2.207001 100
mosa 255.591301 281.158350 291.391586 286.549802 292.101601 545.880702 100