Calculate Means and Covariances for large list of dataframes, replacing loops with lapply

别说谁变了你拦得住时间么 提交于 2019-12-11 04:37:30

问题


I previously posted a question of how to create all possible combinations of a set of dataframes or the "power set" of possible data frames in this link: Creating Dataframes of all Possible Combinations without Repetition of Columns with cbind

I was able to create the list of possible dataframes by first creating all possible combinations of the names of the dataframes, and storing them in Ccols, a section of which looks like this:

using reduce and lapply, I then called each dataframe by its name, and stashed in lists, then stashed all those lists in a list of list to calculate the Means and Covariances:

ll_cov<- list()
ll_ER<- list()
for (ii in 2:length(Ccols)){
l_cov<- list()
l_ER<- list()
for (index in 1:ncol(Ccols[[ii]])){
ls<-list()
for (i in 1:length(Ccols[[ii]][,index]) ){

  KK<- get(Ccols[[ii]][i,index])
  ls[[i]] <-KK
}
DAT<- transform(Reduce(merge, lapply(ls, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
l_cov[[index]]<- cov(DAT)
l_ER[[index]]<- colMeans(DAT)

}
ll_cov[[ii]]<- l_cov
ll_ER[[ii]]<- l_ER
}

However, the Loop is becoming too time-consuming due to the high number of dataframes being processed and cov and colMeans calculations. I searched and came across this example ( Looping over a list of data frames and calculate the correlation coefficient ) which mentions listing data frames and then applying cov as a function, but it still running way too slowly. I tried removing one of the loops by introducing one lapply instead of the very outer loop:

Power_f<- function(X){

l_D<- list()
for (index in 2:ncol(X)){

     ls<-list()
     for (i in 1:length(X[,index]) ){
          KK<- get(X[i,index])
          ls[[i]] <-KK
     }

DAT<- transform(Reduce(merge, lapply(ls, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
l_D[[index]]<- (DAT)
}
return(l_D)
}

lapply(seq(from=2,to=(length(Ccols))), function(i) Power_f(Ccols[[i]]))

But it is still taking too long to run (I am not getting results). Is there a way to replace all the for looping with lapply and make it computationally efficient?

来源:https://stackoverflow.com/questions/46690015/calculate-means-and-covariances-for-large-list-of-dataframes-replacing-loops-wi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!