问题
I'm trying to create a data.frame that takes different values depending on the value of a reference data.frame. I only know how to do this with a "for loop", but have been advised to avoid for loops in R... and my actual data have ~500,000 rows x ~200 columns.
a <- as.data.frame(matrix(rbinom(10,1,0.5),5,2,dimnames=list(c(1:5),c("a","b"))))
b <- data.frame(v1=c(2,10,12,5,11,3,4,14,2,13),v2=c("a","b","b","a","b","a","a","b","a","b"))
c <- as.data.frame(matrix(0,5,2))
for (i in 1:5){
for(j in 1:2){
if(a[i,j]==1){
c[i,j] <- mean(b$v1[b$v2==colnames(a)[j]])
} else {
c[i,j]= mean(b$v1)
}}}
c
I create data.frame "c" based on the value in each cell, and the corresponding column name, of data.frame "a". Is there another way to do this? Indexing? Using data.table? Maybe apply functions? Any and all help is greatly appreciated!
回答1:
(a == 0) * mean(b$v1) + t(t(a) * c(tapply(b$v1, b$v2, mean)))
Run in pieces to understand what's happening. Also, note that this assumes ordered names in a (and 0's and 1's as entries in it, as per OP).
An alternative to a bunch of t's as above is using mapply (this assumes a is a data.frame or data.table and not a matrix, while the above doesn't care):
(a == 0) * mean(b$v1) + mapply(`*`, a, tapply(b$v1, b$v2, mean))
回答2:
#subsetting a matrix is faster
res <- as.matrix(a)
#calculate fill-in values outside the loop
in1 <- mean(b$v1)
in2 <- sapply(colnames(a),function(i) mean(b$v1[b$v2==i]))
#loop over columns and use a vectorized approach
for (i in seq_len(ncol(res))) {
res[,i] <- ifelse(res[,i]==0, in1, in2[i])
}
来源:https://stackoverflow.com/questions/17708811/create-data-frame-conditional-on-another-df-without-for-loop