“no such index at level 1” error for a specific scenario when trying to use data.table programmatically

我的梦境 提交于 2020-01-14 16:32:58

问题


The Problem

I wrote a function to use data.table programmatically. The function is as follows

transformVariables4              <- function(df_1n_data,
                                             c_1n_variablesToTransform,
                                             c_1n_newVariableNames,
                                             f_01_functionToTransform,
                                             ...) {

  for (i in 1:length(c_1n_variablesToTransform)) {
    df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))]
  }

  return(df_1n_data)
}

The function works fine for this scenario

df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2)
df4 <- transformVariables4(
  df_1n_data = data.table(df),
  c_1n_variablesToTransform = "e",
  c_1n_newVariableNames = "new",
  f_01_functionToTransform = sum,
  na.rm = TRUE
)

But not for the below scenario

df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2, g = 3, h = 4, i = 5, j = 6, k = 7, l = 8)
df4 <- transformVariables4(
  df_1n_data = data.table(df),
  c_1n_variablesToTransform = "e",
  c_1n_newVariableNames = "new",
  f_01_functionToTransform = sum,
  na.rm = TRUE
)

It throws an error saying

Error in .subset2(x, i, exact = exact) : no such index at level 1

The only difference between the two scenarios is that in the second scenario the data contains more columns

I'm trying to figure out what the problem might be and fix it. It is taking a bit of time. If there is some other way in which I can make this work quickly it'd be great :)


What I figured out so far

I tried debugging it. Below is a part of the traceback output

Error in .subset2(x, i, exact = exact) : no such index at level 1 

12 (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, 
    i, exact = exact))(x, ..., exact = exact) 

11 `[[.data.frame`(df_1n_data, c_1n_variablesToTransform[i]) 

10 df_1n_data[[c_1n_variablesToTransform[i]]] 

9 eval(expr, envir, enclos) 

8 eval(jsub, SDenv, parent.frame()) 

7 `[.data.table`(df_1n_data, , `:=`(c(c_1n_newVariableNames[i]), 
    list(forceAndCall(n = 1, FUN = f_01_functionToTransform, 
        df_1n_data[[c_1n_variablesToTransform[i]]], ...)))) at abcd.R#75

6 df_1n_data[, `:=`(c(c_1n_newVariableNames[i]), list(forceAndCall(n = 1, 
    FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], 
    ...)))] at abcd.R#75

5 transformVariables4(df_1n_data = data.table(df), c_1n_variablesToTransform = "e", 
    c_1n_newVariableNames = "new", f_01_functionToTransform = sum, 
    na.rm = TRUE) at abcd.R#90

The line of code at the top (indicated by 12) is in the source code of the function [[.data.frame. The value of i in that line of code for the first scenario is

"e"

but for the second scenario it is

c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_)

which is making .subset2(x, i, exact = exact) fail. Next step is to figure out the cause of this behavior.

Update

Figured out the cause of this behavior. It is because the i on the RHS of := in

df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))]
  }

matches a column name in the data. Next step is to figure out why exactly this happened and what is the right way to go about doing this

Update

Thanks Roland for helping me understand why exactly this happened and what is the right way to go about doing this

The i issue is a scoping issue. data.table uses the first i on its search path which is the i column in the data which is all NAs and that in turn causes .subset2 to fail. The right way to go about doing what I set out to do will be by using the second function from Roland's solution


回答1:


I would rewrite it like this:

transformVariables4              <- function(df_1n_data,
                                             c_1n_variablesToTransform,
                                             c_1n_newVariableNames,
                                             f_01_functionToTransform,
                                             ...) {

  for (i in seq_along(c_1n_variablesToTransform)) {
    var <- c_1n_variablesToTransform[i] #to force evaluation
    df_1n_data[, (c_1n_newVariableNames[i]) := f_01_functionToTransform(get(var), ...)]
  }

  df_1n_data[]
}
library(data.table)

df <- data.frame(abcd = (as.Date("1991-12-22") + 1:10), e = 1, f = 2, g = 3, h = 4, i = 5, j = 6, k = 7, l = 8)
df4 <- transformVariables4(
  df_1n_data = data.table(df),
  c_1n_variablesToTransform = "e",
  c_1n_newVariableNames = "new",
  f_01_functionToTransform = sum,
  na.rm = TRUE
)

Of course, more idiomatic would be this:

transformVariables4              <- function(df_1n_data,
                                             c_1n_variablesToTransform,
                                             c_1n_newVariableNames,
                                             f_01_functionToTransform,
                                             ...) {

  df_1n_data[, (c_1n_newVariableNames) := lapply(.SD, f_01_functionToTransform, ...), 
              .SDcols = c_1n_variablesToTransform]

  df_1n_data[]
}

I would also use shorter parameter names to improve readability.



来源:https://stackoverflow.com/questions/37480543/no-such-index-at-level-1-error-for-a-specific-scenario-when-trying-to-use-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!