问题
I’m looking to apply a function over an initial dataframe multiple times. As a simple example, take this data:
library(dplyr)
thisdata <- data.frame(vara = seq(from = 1, to = 20, by = 1)
,varb = seq(from = 1, to = 20, by = 1))
And here is a simple function I would like to run over it:
simplefunc <- function(data) {datasetfinal2 <- data %>% mutate(varb = varb+1)
return(datasetfinal2)}
thisdata2 <- simplefunc(thisdata)
thisdata3 <- simplefunc(thisdata2)
So, how would I run this function, say 10 times, without having to keep calling the function (ie. thisdata3)? I’m mostly interested in the final dataframe after the replication but it would be good to have a list of all the dataframes produced so I can run some diagnostics. Appreciate the help!
回答1:
Dealing with multiple identically-structured data.frames individually is a difficult way to manage things, especially if the number of iterations is more than a few. A popular "best practice" is to deal with a "list of data.frames", something like:
n <- 10 # number of times you need to repeat the process
out <- vector("list", n)
out[[1]] <- thisdata
for (i in 2:n) out[[i]] <- simplefunc(out[[i-1]])
You can look at any interim value with
str(out[[10]])
# 'data.frame': 20 obs. of 2 variables:
# $ vara: num 1 2 3 4 5 6 7 8 9 10 ...
# $ varb: num 10 11 12 13 14 15 16 17 18 19 ...
and, as you might expect, the final result is in out[[n]]
.
This can be simplified slightly using Reduce
, and adding a throw-away second argument to simplefunc
:
simplefunc <- function(data, ...) {
datasetfinal2 <- data %>% mutate(varb = varb+1)
return(datasetfinal2)
}
out <- Reduce(simplefunc, 1:10, init = thisdata, accumulate = TRUE)
This effectively does:
tmp <- simplefunc(thisdata, 1)
tmp <- simplefunc(tmp, 2)
tmp <- simplefunc(tmp, 3)
# ...
(In fact, if you look at the source for Reduce
, it's effectively doing my first suggestion above.)
Note that if simplefunc
has other arguments that cannot be dropped, perhaps:
simplefunc <- function(data, ..., otherarg, anotherarg) {
datasetfinal2 <- data %>% mutate(varb = varb+1)
return(datasetfinal2)
}
though you must change all other calls to simplefunc
to pass parameters "by-name" instead of by-position (which is a common/default way).
Edit: if you cannot (or do not want to) edit simplefunc
, you can always use an anonymous function to ignore the iterator/counter:
Reduce(function(x, ign) simplefunc(x), 1:10, init = thisdata, accumulate = TRUE)
回答2:
We can use a for
loop
thisdata1 <- thisdata
for(i in 2:3){
assign(paste0('thisdata', i), value = simplefunc(get(paste0('thisdata', i-1))))
}
NOTE1: It is better not to create individual objects in the global environment where the operations can be done easily within a list
.
NOTE2: Forgot to add the disclaimer earlier
来源:https://stackoverflow.com/questions/45134735/r-run-function-over-same-dataframe-multiple-times