Create data.frame conditional on another df without for loop

你说的曾经没有我的故事 提交于 2019-12-25 01:55:43

问题


I'm trying to create a data.frame that takes different values depending on the value of a reference data.frame. I only know how to do this with a "for loop", but have been advised to avoid for loops in R... and my actual data have ~500,000 rows x ~200 columns.

a <- as.data.frame(matrix(rbinom(10,1,0.5),5,2,dimnames=list(c(1:5),c("a","b"))))
b <- data.frame(v1=c(2,10,12,5,11,3,4,14,2,13),v2=c("a","b","b","a","b","a","a","b","a","b"))
c <- as.data.frame(matrix(0,5,2))

for (i in 1:5){
  for(j in 1:2){
    if(a[i,j]==1){
      c[i,j] <- mean(b$v1[b$v2==colnames(a)[j]])
    } else {
      c[i,j]= mean(b$v1)
    }}}
c 

I create data.frame "c" based on the value in each cell, and the corresponding column name, of data.frame "a". Is there another way to do this? Indexing? Using data.table? Maybe apply functions? Any and all help is greatly appreciated!


回答1:


(a == 0) * mean(b$v1) + t(t(a) * c(tapply(b$v1, b$v2, mean)))

Run in pieces to understand what's happening. Also, note that this assumes ordered names in a (and 0's and 1's as entries in it, as per OP).

An alternative to a bunch of t's as above is using mapply (this assumes a is a data.frame or data.table and not a matrix, while the above doesn't care):

(a == 0) * mean(b$v1) + mapply(`*`, a, tapply(b$v1, b$v2, mean))



回答2:


#subsetting a matrix is faster
res <- as.matrix(a)

#calculate fill-in values outside the loop
in1 <- mean(b$v1)
in2 <- sapply(colnames(a),function(i) mean(b$v1[b$v2==i]))

#loop over columns and use a vectorized approach 
for (i in seq_len(ncol(res))) {
  res[,i] <- ifelse(res[,i]==0, in1, in2[i])
}


来源:https://stackoverflow.com/questions/17708811/create-data-frame-conditional-on-another-df-without-for-loop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!