“no such index at level 1” error for a specific scenario when trying to use data.table programmatically

问题

The Problem

I wrote a function to use data.table programmatically. The function is as follows

transformVariables4              <- function(df_1n_data,
                                             c_1n_variablesToTransform,
                                             c_1n_newVariableNames,
                                             f_01_functionToTransform,
                                             ...) {

  for (i in 1:length(c_1n_variablesToTransform)) {
    df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))]
  }

  return(df_1n_data)
}

The function works fine for this scenario

df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2)
df4 <- transformVariables4(
  df_1n_data = data.table(df),
  c_1n_variablesToTransform = "e",
  c_1n_newVariableNames = "new",
  f_01_functionToTransform = sum,
  na.rm = TRUE
)

But not for the below scenario

df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2, g = 3, h = 4, i = 5, j = 6, k = 7, l = 8)
df4 <- transformVariables4(
  df_1n_data = data.table(df),
  c_1n_variablesToTransform = "e",
  c_1n_newVariableNames = "new",
  f_01_functionToTransform = sum,
  na.rm = TRUE
)

It throws an error saying

Error in .subset2(x, i, exact = exact) : no such index at level 1

The only difference between the two scenarios is that in the second scenario the data contains more columns

I'm trying to figure out what the problem might be and fix it. It is taking a bit of time. If there is some other way in which I can make this work quickly it'd be great :)

What I figured out so far

I tried debugging it. Below is a part of the traceback output

Error in .subset2(x, i, exact = exact) : no such index at level 1 

12 (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, 
    i, exact = exact))(x, ..., exact = exact) 

11 `[[.data.frame`(df_1n_data, c_1n_variablesToTransform[i]) 

10 df_1n_data[[c_1n_variablesToTransform[i]]] 

9 eval(expr, envir, enclos) 

8 eval(jsub, SDenv, parent.frame()) 

7 `[.data.table`(df_1n_data, , `:=`(c(c_1n_newVariableNames[i]), 
    list(forceAndCall(n = 1, FUN = f_01_functionToTransform, 
        df_1n_data[[c_1n_variablesToTransform[i]]], ...)))) at abcd.R#75

6 df_1n_data[, `:=`(c(c_1n_newVariableNames[i]), list(forceAndCall(n = 1, 
    FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], 
    ...)))] at abcd.R#75

5 transformVariables4(df_1n_data = data.table(df), c_1n_variablesToTransform = "e", 
    c_1n_newVariableNames = "new", f_01_functionToTransform = sum, 
    na.rm = TRUE) at abcd.R#90

The line of code at the top (indicated by 12) is in the source code of the function [[.data.frame. The value of i in that line of code for the first scenario is

"e"

but for the second scenario it is

c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_)

which is making .subset2(x, i, exact = exact) fail. Next step is to figure out the cause of this behavior.

Update

Figured out the cause of this behavior. It is because the i on the RHS of := in

df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))]
  }

matches a column name in the data. Next step is to figure out why exactly this happened and what is the right way to go about doing this

Update

Thanks Roland for helping me understand why exactly this happened and what is the right way to go about doing this

The i issue is a scoping issue. data.table uses the first i on its search path which is the i column in the data which is all NAs and that in turn causes .subset2 to fail. The right way to go about doing what I set out to do will be by using the second function from Roland's solution

回答1:

I would rewrite it like this:

transformVariables4              <- function(df_1n_data,
                                             c_1n_variablesToTransform,
                                             c_1n_newVariableNames,
                                             f_01_functionToTransform,
                                             ...) {

  for (i in seq_along(c_1n_variablesToTransform)) {
    var <- c_1n_variablesToTransform[i] #to force evaluation
    df_1n_data[, (c_1n_newVariableNames[i]) := f_01_functionToTransform(get(var), ...)]
  }

  df_1n_data[]
}
library(data.table)

df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2, g = 3, h = 4, i = 5, j = 6, k = 7, l = 8)
df4 <- transformVariables4(
  df_1n_data = data.table(df),
  c_1n_variablesToTransform = "e",
  c_1n_newVariableNames = "new",
  f_01_functionToTransform = sum,
  na.rm = TRUE
)

Of course, more idiomatic would be this:

transformVariables4              <- function(df_1n_data,
                                             c_1n_variablesToTransform,
                                             c_1n_newVariableNames,
                                             f_01_functionToTransform,
                                             ...) {

  df_1n_data[, (c_1n_newVariableNames) := lapply(.SD, f_01_functionToTransform, ...), 
              .SDcols = c_1n_variablesToTransform]

  df_1n_data[]
}

I would also use shorter parameter names to improve readability.

来源：https://stackoverflow.com/questions/37480543/no-such-index-at-level-1-error-for-a-specific-scenario-when-trying-to-use-data

标签

data.table

subset