Alternative for sample

Deadly 提交于 2021-02-10 05:29:05

问题


I have the following sample code that uses sapply which takes long to process (since executed many times):

samples = sapply(rowIndices, function(idx){
  sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})

The issue is that I have to draw from the weights which are in the matrix, dependent on the indices in rowIndices.

Does somebody have a better idea in mind to draw from the rows of the matrix?

Reproducable example:

rowIndices = floor(runif(1000, 1, 100))
vectorToDrawFrom = runif(5000, 0.0, 2.0)
weights = matrix(runif(100 * 5000, 1, 10), nrow = 100, ncol = 5000)

timer = 0
for (i in 1:2500){
  ptm = proc.time()
  samples = sapply(rowIndices, function(idx){
    sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
  })
  timer = timer + (proc.time() - ptm)[3]
}

print(timer) # too long!!

回答1:


So here is a way I would speed up your code. One thing to note: the sampled value will not "match" with rowIndices though it would be trivial to get things in the right order. 2) You only store the last iteration, though maybe that is just because this a Minimal Reproducible example...

Basically you should only need to call sample once per value of rowIndices since rowIndices ranges from 1-99, that's 99 calls instead of 1000, which provides a huge speed up.

We can just sort the row indices before we start

rowIndices <- sort(rowIndices) ##sort the row indices and then loop
for (i in 1:15){
    samples = unlist(sapply(unique(rowIndices), 
        function(idx){
            sample(vectorToDrawFrom, sum(rowIndices %in% idx), 
                TRUE, weights[idx, ])
    }))       
}

Unit: milliseconds

expr
                      min       lq     mean   median       uq      max neval cld
 newForLoop      263.5668 266.6329 292.8301 268.8920 275.3378  515.899   100  a 
 OriginalForLoop 698.2982 705.6911 792.2846 712.9985 887.9447 1263.779   100   b

Edit

The way to maintain the original vector ordering is to save the index or the orignal rowIndices vector. Then sort the row indices and proceed.

set.seed(8675309)
weights = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0), 
                 nrow = 5, ncol = 3, byrow = T)

rowIndices = c(2,1,2,4)
vectorToDrawFrom = runif(3, 0.0, 2.0)

set.seed(8675309)
##This is the origal code
sample2 = sapply(rowIndices, function(idx){       
  sample(vectorToDrawFrom, 1, TRUE, weights[idx, ])
})

rowIndx <- order(rowIndices)   #get ordering index
rowIndices <- sort(rowIndices) 

set.seed(8675309)
samples = unlist(sapply(unique(rowIndices), function(idx){
  sample(vectorToDrawFrom, sum(rowIndices %in% idx), TRUE, weights[idx, ])
}))

samples = samples[order(rowIndx)]
all(samples == sample2)
#[1] TRUE


来源:https://stackoverflow.com/questions/46771186/alternative-for-sample

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!