Calculate Means and Covariances for large list of dataframes, replacing loops with lapply

问题

I previously posted a question of how to create all possible combinations of a set of dataframes or the "power set" of possible data frames in this link: Creating Dataframes of all Possible Combinations without Repetition of Columns with cbind

I was able to create the list of possible dataframes by first creating all possible combinations of the names of the dataframes, and storing them in Ccols, a section of which looks like this:

using reduce and lapply, I then called each dataframe by its name, and stashed in lists, then stashed all those lists in a list of list to calculate the Means and Covariances:

ll_cov<- list()
ll_ER<- list()
for (ii in 2:length(Ccols)){
l_cov<- list()
l_ER<- list()
for (index in 1:ncol(Ccols[[ii]])){
ls<-list()
for (i in 1:length(Ccols[[ii]][,index]) ){

  KK<- get(Ccols[[ii]][i,index])
  ls[[i]] <-KK
}
DAT<- transform(Reduce(merge, lapply(ls, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
l_cov[[index]]<- cov(DAT)
l_ER[[index]]<- colMeans(DAT)

}
ll_cov[[ii]]<- l_cov
ll_ER[[ii]]<- l_ER
}

However, the Loop is becoming too time-consuming due to the high number of dataframes being processed and cov and colMeans calculations. I searched and came across this example ( Looping over a list of data frames and calculate the correlation coefficient ) which mentions listing data frames and then applying cov as a function, but it still running way too slowly. I tried removing one of the loops by introducing one lapply instead of the very outer loop:

Power_f<- function(X){

l_D<- list()
for (index in 2:ncol(X)){

     ls<-list()
     for (i in 1:length(X[,index]) ){
          KK<- get(X[i,index])
          ls[[i]] <-KK
     }

DAT<- transform(Reduce(merge, lapply(ls, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
l_D[[index]]<- (DAT)
}
return(l_D)
}

lapply(seq(from=2,to=(length(Ccols))), function(i) Power_f(Ccols[[i]]))

But it is still taking too long to run (I am not getting results). Is there a way to replace all the for looping with lapply and make it computationally efficient?

来源：https://stackoverflow.com/questions/46690015/calculate-means-and-covariances-for-large-list-of-dataframes-replacing-loops-wi

标签

loops

dataframe

lapply

covariance