Sum of subvectors of a vector in R

后端 未结 6 741
悲哀的现实
悲哀的现实 2021-01-04 19:01

Given a vector x of length k, I would like to obtain a k by k matrix X where X[i,j] is the sum of x[i] + ... + x[j]. The

6条回答
  •  余生分开走
    2021-01-04 19:27

    In addition to the excellent answers already provided, here is a super fast base R solution:

    subVecSum <- function(v, s) {
        t <- c(0L, cumsum(v))
        n1 <- s+1L
        m <- matrix(0L,s,s)
        for (i in 4L:n1) {
            m[i-2L,1L:(i-3L)] <- t[i-1L]-t[1L:(i-3L)]
            m[i-2L,i-2L] <- v[i-2L]
            m[i-2L,(i-1L):s] <- t[i:n1]-t[i-2L]
        }
        m[1L,] <- t[-1L]; m[s,] <- t[n1]-t[1L:s]
        m
    }
    

    In fact, according to the benchmarks below, it is the fastest base R solution (@Roland's Rcpp solution is still the fastest). It also gets faster, relatively speaking, as the size of the vector increases (I only compared f4 (provided by @docendo) as it is the fastest base R solution thus far and @Roland's Rcpp implementation. You will note that I'm using the modified f4 function as defined by @Roland).

    ## We first compile the functions.. no need to compile the Rcpp
    ## function as it is already done by calling cppFunction
    c.f4 <- compiler::cmpfun(f4)
    c.subVS1 <- compiler::cmpfun(subVecSum)
    
    n <- 100
    x <- 1:n
    microbenchmark(c.f4(x,n), c.subVS1(x,n), allSums2(x), times = 1000, unit = "relative")
    Unit: relative
              expr       min        lq     mean    median        uq       max neval cld
        c.f4(x, n) 11.355013 11.262663 9.231756 11.545315 12.074004 1.0819186  1000   c
    c.subVS1(x, n)  7.795879  7.592643 5.414135  7.624209  8.080471 0.8490876  1000  b 
       allSums2(x)  1.000000  1.000000 1.000000  1.000000  1.000000 1.0000000  1000 a  
    
    n <- 500
    x <- 1:n
    microbenchmark(c.f4(x,n), c.subVS1(x,n), allSums2(x), times = 500, unit = "relative")
    Unit: relative
              expr      min       lq     mean   median       uq       max neval cld
        c.f4(x, n) 6.231426 6.585118 6.442567 6.438163 6.882862 10.124428   500   c
    c.subVS1(x, n) 3.548766 3.271089 3.137887 2.881520 3.604536  8.854241   500  b 
       allSums2(x) 1.000000 1.000000 1.000000 1.000000 1.000000  1.000000   500 a  
    
    n <- 1000
    x <- 1:n
    microbenchmark(c.f4(x,n), c.subVS1(x,n), allSums2(x), times = 100, unit = "relative")
    Unit: relative
              expr      min        lq      mean    median        uq      max neval cld
        c.f4(x, n) 7.779537 16.352334 11.489506 15.529351 14.447210 3.639483   100   c
    c.subVS1(x, n) 2.637996  2.951763  2.937385  2.726569  2.692099 1.211545   100  b 
       allSums2(x) 1.000000  1.000000  1.000000  1.000000  1.000000 1.000000   100 a  
    
    identical(c.f4(x,n), c.subVS1(x,n), as.integer(allSums2(x)))  ## gives the same results
    [1] TRUE
    

    This algorithm takes advantage of only calculating cumsum(v) one time and utilizing indexing from there. For really large vectors, the efficiency is comparable to the Rcpp solution provided by @Roland. Observe:

    n <- 5000
    x <- 1:n
    microbenchmark(c.subVS1(x,n), allSums2(x), times = 10, unit = "relative")
    Unit: relative
              expr      min       lq     mean   median       uq      max neval cld
    c.subVS1(x, n) 1.900718 1.865304 1.854165 1.865396 1.769996 1.837354    10   b
       allSums2(x) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    10  a 
    
    
    n <- 10000
    x <- 1:n
    microbenchmark(c.subVS1(x,n), allSums2(x), times = 10, unit = "relative")
    Unit: relative
              expr      min      lq     mean   median       uq     max neval cld
    c.subVS1(x, n) 1.503538 1.53851 1.493883 1.526843 1.496783 1.29196    10   b
       allSums2(x) 1.000000 1.00000 1.000000 1.000000 1.000000 1.00000    10  a 
    

    Not bad, for base R, however Rcpp stills rules the day!!!

提交回复
热议问题