问题
The Problem
I wrote a function to use data.table programmatically. The function is as follows
transformVariables4 <- function(df_1n_data,
c_1n_variablesToTransform,
c_1n_newVariableNames,
f_01_functionToTransform,
...) {
for (i in 1:length(c_1n_variablesToTransform)) {
df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))]
}
return(df_1n_data)
}
The function works fine for this scenario
df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2)
df4 <- transformVariables4(
df_1n_data = data.table(df),
c_1n_variablesToTransform = "e",
c_1n_newVariableNames = "new",
f_01_functionToTransform = sum,
na.rm = TRUE
)
But not for the below scenario
df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2, g = 3, h = 4, i = 5, j = 6, k = 7, l = 8)
df4 <- transformVariables4(
df_1n_data = data.table(df),
c_1n_variablesToTransform = "e",
c_1n_newVariableNames = "new",
f_01_functionToTransform = sum,
na.rm = TRUE
)
It throws an error saying
Error in .subset2(x, i, exact = exact) : no such index at level 1
The only difference between the two scenarios is that in the second scenario the data contains more columns
I'm trying to figure out what the problem might be and fix it. It is taking a bit of time. If there is some other way in which I can make this work quickly it'd be great :)
What I figured out so far
I tried debugging it. Below is a part of the traceback output
Error in .subset2(x, i, exact = exact) : no such index at level 1
12 (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x,
i, exact = exact))(x, ..., exact = exact)
11 `[[.data.frame`(df_1n_data, c_1n_variablesToTransform[i])
10 df_1n_data[[c_1n_variablesToTransform[i]]]
9 eval(expr, envir, enclos)
8 eval(jsub, SDenv, parent.frame())
7 `[.data.table`(df_1n_data, , `:=`(c(c_1n_newVariableNames[i]),
list(forceAndCall(n = 1, FUN = f_01_functionToTransform,
df_1n_data[[c_1n_variablesToTransform[i]]], ...)))) at abcd.R#75
6 df_1n_data[, `:=`(c(c_1n_newVariableNames[i]), list(forceAndCall(n = 1,
FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]],
...)))] at abcd.R#75
5 transformVariables4(df_1n_data = data.table(df), c_1n_variablesToTransform = "e",
c_1n_newVariableNames = "new", f_01_functionToTransform = sum,
na.rm = TRUE) at abcd.R#90
The line of code at the top (indicated by 12
) is in the source code of the function [[.data.frame
. The value of i
in that line of code for the first scenario is
"e"
but for the second scenario it is
c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_)
which is making .subset2(x, i, exact = exact)
fail. Next step is to figure out the cause of this behavior.
Update
Figured out the cause of this behavior. It is because the i
on the RHS of :=
in
df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))]
}
matches a column name in the data. Next step is to figure out why exactly this happened and what is the right way to go about doing this
Update
Thanks Roland for helping me understand why exactly this happened and what is the right way to go about doing this
The i
issue is a scoping issue. data.table uses the first i
on its search path which is the i
column in the data which is all NA
s and that in turn causes .subset2
to fail. The right way to go about doing what I set out to do will be by using the second function from Roland's solution
回答1:
I would rewrite it like this:
transformVariables4 <- function(df_1n_data,
c_1n_variablesToTransform,
c_1n_newVariableNames,
f_01_functionToTransform,
...) {
for (i in seq_along(c_1n_variablesToTransform)) {
var <- c_1n_variablesToTransform[i] #to force evaluation
df_1n_data[, (c_1n_newVariableNames[i]) := f_01_functionToTransform(get(var), ...)]
}
df_1n_data[]
}
library(data.table)
df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2, g = 3, h = 4, i = 5, j = 6, k = 7, l = 8)
df4 <- transformVariables4(
df_1n_data = data.table(df),
c_1n_variablesToTransform = "e",
c_1n_newVariableNames = "new",
f_01_functionToTransform = sum,
na.rm = TRUE
)
Of course, more idiomatic would be this:
transformVariables4 <- function(df_1n_data,
c_1n_variablesToTransform,
c_1n_newVariableNames,
f_01_functionToTransform,
...) {
df_1n_data[, (c_1n_newVariableNames) := lapply(.SD, f_01_functionToTransform, ...),
.SDcols = c_1n_variablesToTransform]
df_1n_data[]
}
I would also use shorter parameter names to improve readability.
来源:https://stackoverflow.com/questions/37480543/no-such-index-at-level-1-error-for-a-specific-scenario-when-trying-to-use-data