R: run function over same dataframe multiple times

一世执手 提交于 2019-12-11 14:18:47

问题


I’m looking to apply a function over an initial dataframe multiple times. As a simple example, take this data:

library(dplyr)
thisdata <-  data.frame(vara = seq(from = 1, to = 20, by = 1)
                        ,varb = seq(from = 1, to = 20, by = 1))

And here is a simple function I would like to run over it:

simplefunc <- function(data) {datasetfinal2 <- data %>% mutate(varb = varb+1)
return(datasetfinal2)}
thisdata2 <- simplefunc(thisdata)

thisdata3 <- simplefunc(thisdata2)

So, how would I run this function, say 10 times, without having to keep calling the function (ie. thisdata3)? I’m mostly interested in the final dataframe after the replication but it would be good to have a list of all the dataframes produced so I can run some diagnostics. Appreciate the help!


回答1:


Dealing with multiple identically-structured data.frames individually is a difficult way to manage things, especially if the number of iterations is more than a few. A popular "best practice" is to deal with a "list of data.frames", something like:

n <- 10 # number of times you need to repeat the process
out <- vector("list", n)
out[[1]] <- thisdata
for (i in 2:n) out[[i]] <- simplefunc(out[[i-1]])

You can look at any interim value with

str(out[[10]])
# 'data.frame': 20 obs. of  2 variables:
#  $ vara: num  1 2 3 4 5 6 7 8 9 10 ...
#  $ varb: num  10 11 12 13 14 15 16 17 18 19 ...

and, as you might expect, the final result is in out[[n]].

This can be simplified slightly using Reduce, and adding a throw-away second argument to simplefunc:

simplefunc <- function(data, ...) {
  datasetfinal2 <- data %>% mutate(varb = varb+1)
  return(datasetfinal2)
}
out <- Reduce(simplefunc, 1:10, init = thisdata, accumulate = TRUE)

This effectively does:

tmp <- simplefunc(thisdata, 1)
tmp <- simplefunc(tmp, 2)
tmp <- simplefunc(tmp, 3)
# ...

(In fact, if you look at the source for Reduce, it's effectively doing my first suggestion above.)

Note that if simplefunc has other arguments that cannot be dropped, perhaps:

simplefunc <- function(data, ..., otherarg, anotherarg) {
  datasetfinal2 <- data %>% mutate(varb = varb+1)
  return(datasetfinal2)
}

though you must change all other calls to simplefunc to pass parameters "by-name" instead of by-position (which is a common/default way).

Edit: if you cannot (or do not want to) edit simplefunc, you can always use an anonymous function to ignore the iterator/counter:

Reduce(function(x, ign) simplefunc(x), 1:10, init = thisdata, accumulate = TRUE)



回答2:


We can use a for loop

thisdata1 <- thisdata
for(i in 2:3){
   assign(paste0('thisdata', i), value = simplefunc(get(paste0('thisdata', i-1))))
 }

NOTE1: It is better not to create individual objects in the global environment where the operations can be done easily within a list.

NOTE2: Forgot to add the disclaimer earlier



来源:https://stackoverflow.com/questions/45134735/r-run-function-over-same-dataframe-multiple-times

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!