Sum of subvectors of a vector in R

后端未结

关注

 6  741

悲哀的现实 2021-01-04 19:01

Given a vector x of length k, I would like to obtain a k by k matrix X where X[i,j] is the sum of x[i] + ... + x[j]. The

6条回答

余生分开走 (楼主)

2021-01-04 19:27

In addition to the excellent answers already provided, here is a super fast base R solution:

subVecSum <- function(v, s) {
    t <- c(0L, cumsum(v))
    n1 <- s+1L
    m <- matrix(0L,s,s)
    for (i in 4L:n1) {
        m[i-2L,1L:(i-3L)] <- t[i-1L]-t[1L:(i-3L)]
        m[i-2L,i-2L] <- v[i-2L]
        m[i-2L,(i-1L):s] <- t[i:n1]-t[i-2L]
    }
    m[1L,] <- t[-1L]; m[s,] <- t[n1]-t[1L:s]
    m
}

In fact, according to the benchmarks below, it is the fastest base R solution (@Roland's Rcpp solution is still the fastest). It also gets faster, relatively speaking, as the size of the vector increases (I only compared f4 (provided by @docendo) as it is the fastest base R solution thus far and @Roland's Rcpp implementation. You will note that I'm using the modified f4 function as defined by @Roland).

## We first compile the functions.. no need to compile the Rcpp
## function as it is already done by calling cppFunction
c.f4 <- compiler::cmpfun(f4)
c.subVS1 <- compiler::cmpfun(subVecSum)

n <- 100
x <- 1:n
microbenchmark(c.f4(x,n), c.subVS1(x,n), allSums2(x), times = 1000, unit = "relative")
Unit: relative
          expr       min        lq     mean    median        uq       max neval cld
    c.f4(x, n) 11.355013 11.262663 9.231756 11.545315 12.074004 1.0819186  1000   c
c.subVS1(x, n)  7.795879  7.592643 5.414135  7.624209  8.080471 0.8490876  1000  b 
   allSums2(x)  1.000000  1.000000 1.000000  1.000000  1.000000 1.0000000  1000 a  

n <- 500
x <- 1:n
microbenchmark(c.f4(x,n), c.subVS1(x,n), allSums2(x), times = 500, unit = "relative")
Unit: relative
          expr      min       lq     mean   median       uq       max neval cld
    c.f4(x, n) 6.231426 6.585118 6.442567 6.438163 6.882862 10.124428   500   c
c.subVS1(x, n) 3.548766 3.271089 3.137887 2.881520 3.604536  8.854241   500  b 
   allSums2(x) 1.000000 1.000000 1.000000 1.000000 1.000000  1.000000   500 a  

n <- 1000
x <- 1:n
microbenchmark(c.f4(x,n), c.subVS1(x,n), allSums2(x), times = 100, unit = "relative")
Unit: relative
          expr      min        lq      mean    median        uq      max neval cld
    c.f4(x, n) 7.779537 16.352334 11.489506 15.529351 14.447210 3.639483   100   c
c.subVS1(x, n) 2.637996  2.951763  2.937385  2.726569  2.692099 1.211545   100  b 
   allSums2(x) 1.000000  1.000000  1.000000  1.000000  1.000000 1.000000   100 a  

identical(c.f4(x,n), c.subVS1(x,n), as.integer(allSums2(x)))  ## gives the same results
[1] TRUE

This algorithm takes advantage of only calculating cumsum(v) one time and utilizing indexing from there. For really large vectors, the efficiency is comparable to the Rcpp solution provided by @Roland. Observe:

n <- 5000
x <- 1:n
microbenchmark(c.subVS1(x,n), allSums2(x), times = 10, unit = "relative")
Unit: relative
          expr      min       lq     mean   median       uq      max neval cld
c.subVS1(x, n) 1.900718 1.865304 1.854165 1.865396 1.769996 1.837354    10   b
   allSums2(x) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    10  a 


n <- 10000
x <- 1:n
microbenchmark(c.subVS1(x,n), allSums2(x), times = 10, unit = "relative")
Unit: relative
          expr      min      lq     mean   median       uq     max neval cld
c.subVS1(x, n) 1.503538 1.53851 1.493883 1.526843 1.496783 1.29196    10   b
   allSums2(x) 1.000000 1.00000 1.000000 1.000000 1.000000 1.00000    10  a

Not bad, for base R, however Rcpp stills rules the day!!!

0 讨论(0)

查看其它6个回答