How to speed up or vectorize a for loop?

后端 未结 2 1644
刺人心
刺人心 2020-12-10 07:05

I would like to increase the speed of my for loop via vectorization or using Data.table or something else. I have to run the code on 1,000,000 rows and my code is really slo

相关标签:
2条回答
  • 2020-12-10 07:07

    You can use Rcpp when vectorization is difficult.

    library(Rcpp)
    cppFunction('
      IntegerVector bin(NumericVector Volume, int n) {
        IntegerVector binIdexVector(Volume.size());
        int binIdex = 1;
        double totalVolume =0;
    
        for(int i=0; i<Volume.size(); i++){
          totalVolume = totalVolume + Volume[i];
          if (totalVolume <= n) {
            binIdexVector[i] = binIdex;
          } else {
            binIdex++;
            binIdexVector[i] = binIdex;
            totalVolume = Volume[i];
          }
        }
        return binIdexVector;
      }')
    
    all.equal(bin(Volume, 100), binIdexVector)
    #[1] TRUE
    

    It's faster than findInterval(cumsum(Volume), seq(0, sum(Volume), by=100)) (which of course gives an inexact answer)

    0 讨论(0)
  • 2020-12-10 07:13
    Volume<-sample(1:5,500,replace=TRUE)
    binLabels<- cumsum(diff(cumsum(Volume) %% 100) <0) + 1
    

    This results in the vector binLabels which indicates which bin each data point belongs to. Each bin will hold the number of data points required such that the sum of the data points is 100.

    0 讨论(0)
提交回复
热议问题