Looping multiple listed data frames into a single function

守給你的承諾、 提交于 2019-12-08 05:42:28

问题


I am trying to execute the function varipart() from the package ade4. I am trying to use the same number dataframe from each list in the different parts of the same function. I need to pass this for each set of dataframes.

########### DATA BELOW
    d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
      d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
      d3 <- data.frame(y1 = c(2, 1, 2), y2 = c(5, 6, 4))
      spec.list <- list(d1, d2, d3)

      d1 <- data.frame(y1 = c(20, 87, 39), y2 = c(46, 51, 8))
      d2 <- data.frame(y1 = c(30, 21, 12), y2 = c(61, 51, 33))
      d3 <- data.frame(y1 = c(2, 11, 14), y2 = c(52, 16, 1))
      env.list <- list(d1, d2, d3)

      d1 <- data.frame(y1 = c(0.15, 0.1, 0.9), y2 = c(0.46, 0.51, 0.82))
      d2 <- data.frame(y1 = c(0.13, 0.31, 0.9), y2 = c(0.11, 0.51, 0.38))
      d3 <- data.frame(y1 = c(0.52, 0.11, 0.14), y2 = c(0.52, 0.36, 0.11))
      spat.list <- list(d1, d2, d3)
###############
      # I have tried two ways 
      library(parallel)
      library(ade4)

        output_varpart <- mclapply(spec.list, function(x){
          varipart(x, env.list, spat.list, type = "parametric")
        })

        output_varpart <- mclapply(x, function(x){
          varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
        })

        for(i in 1:length(x)){
          results <- varipart(spec.list, env.list, spat.list, type = "parametric")
        }

None of these methods work! Please be gentle, I'm new to list syntax and looping. Errors are "Warning message: In mclapply(output.spectrans.dudi, function(x) { : all scheduled cores encountered errors in user code" and "Error in x * w : non-numeric argument to binary operator", respectively.


回答1:


You were close, but I'll explain a bit how lapply (and mclapply) work, because it feels like you're mixing up what the role of x is. First, this should work:

output_varpart <- mclapply(1:3, function(x){
      varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
    })

But why?
The function lapply means: apply a function (2nd argument) to all values in a list (first argument). So lapply(list('Hello', 'World', '!'), print) will do

print('Hello')
print('World')
print('!')

and it will return a list of length 3 with the results (the return of print is the value that was printed)

But quite often, there is not one function that does exactly what you want. You can always define a function, like this:

my_vari_fun <- function(index) {
  varipart(spec.list[[index]], env.list[[index]], spat.list[[index]], type = "parametric")
}

You can then call it like my_vari_fun(1), and it doesn't matter at all if the argument is called x or index, or something else. I'm sure you get it. So a next step would be

output_varpart <- lapply(list(1,2,3), my_vari_part)

The disadvantage of this is that it takes multiple lines of code, and we probably won't use my_vari_fun again. So that's the reason we can provide an anonymous function, we just give a function to lapply without assigning it to a name. We just replace my_vari_fun with it's "value" (which happens to be a function).

However, outside this function, x doesn't mean anything. We could as well have called it any other name.

We just need to tell lapply what values to input: list(1,2,3). Or simpler as a vector, which lapply will convert: 1:3

By the way, I've just inserted 3 here, but for the general case you can use 1:length(spec.list), you just have to make sure all lists are the same length.

Finally, I've talked about lapply now, but it all works the same for mclapply. The difference is only under the hood, mclapply will spread its work over multiple cores.

Edit: debugging

In debugging, there is more difference between lapply and mclapply. I will first talk about lapply.

If there is some error in your code that gets executed inside the lapply, the entire lapply will fail, and nothing gets assigned. Which sometimes makes it hard to spot exactly where an error takes place, but it can be done. A simple workaround may be feeding lapply just parts of your input, to see where it breaks.
But R also comes with some debugging tools, where execution is freezes as soon as an error is encountered. I find recover the most useful tool.

You can set it by options(error=recover), and every time an error is encountered, it gives you a backwards list of the function that threw the error, by which function it was called, by which function that was called, ...
Then you can choose a number to explore the environment in which that function was running. When I try to emulate your error, I get this:

Error in x * w : non-numeric argument to binary operator

Enter a frame number, or 0 to exit   

 1: source("~/.active-rstudio-document")
 2: withVisible(eval(ei, envir))
 3: eval(ei, envir)
 4: eval(ei, envir)
 5: .active-rstudio-document#20: lapply(1:3, function(x) {
    varipart(spec.list[[x]], env.list[[x]], spat.list[
 6: FUN(X[[i]], ...)
 7: .active-rstudio-document#21: varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
 8: as.matrix(scalewt(Y, scale = scale))
 9: scalewt(Y, scale = scale)
10: apply(df, 2, weighted.mean, w = wt)
11: FUN(newX[, i], ...)
12: weighted.mean.default(newX[, i], ...)

A lot of them are internal functions by R, and you can see what varipart does: it passes on stuff to lower functions, who pass it on, etc.

For our purposes, we want number 6: here the lapply calls your function, with the i-th input value.
As soon as we enter 6, we get a new prompt, that reads Browse[1]> (in some cases it may be another number), and we are in the environment as if we just entered our

function(x){
  varipart(spec.list[[x]], env.list[[x]], spat.list[[x]], type = "parametric")
}

Which means typing x will give you the value for which this function fails, and spec.list[[x]] etc. will tell you for which inputs varipart failed. Then the final step is deciding what this means: either varipart is broken, or one of your inputs is.

In this case, I noticed I can get the same error by having one of the columns in the data.frame something else then numeric. But you'll have to look whether that is your problem as well, but debugging becomes a whole lot easier if you've figured out where the problem is.

With mclapply

mclapply runs on multiple cores, which means that if there is an error in one core, the other cores still finish their jobs.

For calculations where a forked process encountered an error, that error will be the return value, in the form of a try-error-object. But note that that will be the case for other iterations by the same core as well. So if for mclapply(1:10, fun), fun(1) will throw an error, in the case of 2 cores, all odd inputs will show that error.

So we can look at the return value, to narrow our search down:

sapply(output_varpart, class)

The error(s) is/are in the iterations where the output-class is try-error, but we can't know exactly which one.

How to practically solve it depends on the size of the calculations.
If they were really extensive, it may be worth it to keep the values that did succeed, and narrow it down again by re-running only the failed parts. Or if I just see one try-error, we don't need look any further.
But usually, I find it most useful to change the mclapply to a regular lapply, and use the approach above.



来源:https://stackoverflow.com/questions/53754263/looping-multiple-listed-data-frames-into-a-single-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!