Efficient apply or mapply for multiple matrix arguments by row

前端 未结 2 1619
猫巷女王i
猫巷女王i 2020-12-29 07:47

I have two matrices that I want to apply a function to, by rows:

matrixA
           GSM83009  GSM83037  GSM83002  GSM83029  GSM83041
100001_at  5.873321  5.4         


        
2条回答
  •  误落风尘
    2020-12-29 08:29

    Splitting the matrices isn't the biggest contributor to evaluation time.

    set.seed(21)
    matrixA <- matrix(rnorm(5 * 9000), nrow = 9000)
    matrixB <- matrix(rnorm(4 * 9000), nrow = 9000)
    
    system.time( scores <- mapply(t.test.stat,
        split(matrixA, row(matrixA)), split(matrixB, row(matrixB))) )
    #    user  system elapsed 
    #    1.57    0.00    1.58 
    smA <- split(matrixA, row(matrixA))
    smB <- split(matrixB, row(matrixB))
    system.time( scores <- mapply(t.test.stat, smA, smB) )
    #    user  system elapsed 
    #    1.14    0.00    1.14 
    

    Look at the output from Rprof to see that most of the time is--not surprisingly--spent evaluating t.test.stat (mean, var, etc.). Basically, there's quite a bit of overhead from function calls.

    Rprof()
    scores <- mapply(t.test.stat, smA, smB)
    Rprof(NULL)
    summaryRprof()
    

    You may be able to find faster generalized solutions, but none will approach the speed of the vectorized solution below.

    Since your function is simple, you can take advantage of the vectorized rowMeans function to do this almost instantaneously (though it's a bit messy):

    system.time({
    ncA <- NCOL(matrixA)
    ncB <- NCOL(matrixB)
    ans <- (rowMeans(matrixA)-rowMeans(matrixB)) /
      sqrt( rowMeans((matrixA-rowMeans(matrixA))^2)*(ncA/(ncA-1))/ncA +
            rowMeans((matrixB-rowMeans(matrixB))^2)*(ncB/(ncB-1))/ncB )
    })
    #    user  system elapsed 
    #      0       0       0 
    head(ans)
    # [1]  0.8272511 -1.0965269  0.9862844 -0.6026452 -0.2477661  1.1896181
    

    UPDATE
    Here's a "cleaner" version using a rowVars function:

    rowVars <- function(x, na.rm=FALSE, dims=1L) {
      rowMeans((x-rowMeans(x, na.rm, dims))^2, na.rm, dims)*(NCOL(x)/(NCOL(x)-1))
    }
    ans <- (rowMeans(matrixA)-rowMeans(matrixB)) /
      sqrt( rowVars(matrixA)/NCOL(matrixA) + rowVars(matrixB)/NCOL(matrixB) )
    

提交回复
热议问题