mapply for better performance

此生再无相见时 提交于 2019-12-24 05:57:28

问题


I want to apply a function to a matrix input a, this function would change the first element to c[a[1]] and the next elements to b[a[i],a[i+1]] starting from i = 1 up to i = ncol(a) - 1.

example input:

a <- matrix(c(1,4,3,1),nrow=1)
b <- matrix(1:25,ncol=5,nrow=5)
c <- matrix(4:8,ncol=5,nrow=1)

expected output:

>a
4 16 14 3

#c[a[1]] gave us the first element: 4
#b[a[1],a[2]] gave us the second element: 16 
#b[a[2],a[3]] gave us the third element: 14
#b[a[3],a[4]] gave us the fourth element: 3

I've been trying to use mapply() without any success so far. The idea is to avoid loops since those things can lead to major performance decrease in R


回答1:


Step 1: using single index for addressing matrix

In R matrix elements are stored in column-major order into a vector, so A[i, j] is the same as A[(j-1)*nrow(A) + i]. Consider an example of random 3-by-3 matrix:

set.seed(1); A <- round(matrix(runif(9), 3, 3), 2)

> A
     [,1] [,2] [,3]
[1,] 0.27 0.91 0.94
[2,] 0.37 0.20 0.66
[3,] 0.57 0.90 0.63

Now, this matrix has 3 rows (nrow(A) = 3). Compare:

A[2,3]  # 0.66
A[(3-1) * 3 + 2]  # 0.66

Step 2: vectorizing

You can address multiple elements of a matrix at a time. However, you can only do this by using single indexing mode (Not too precise here, see @alexis_laz's remark later). For example, if you want to extract A[1,2] and A[3,1], but if you do:

A[c(1,3), c(2,1)]
#      [,1] [,2]
# [1,] 0.91 0.27
# [2,] 0.90 0.57

You actually get a block. Now, if you use single indexing, you get what you need:

A[3 * (c(2,1) - 1) + c(1,3)]
# [1] 0.91 0.57

Step 3: getting single index for your problem

Suppose n <- length(a) and you want to address those elements of b:

a[1]    a[2]
a[2]    a[3]
 .       .
 .       .
a[n-1]  a[n]

you can use single index nrow(b) * (a[2:n] - 1) + a[1:(n-1)].

Step 4: complete solution

Since you only have single row for a and c, you should store them as vectors rather than matrices.

a <- c(1,4,3,1)
c <- 4:8

If you were given a matrix and have no choice (as they are currently are in your question), you can convert them into vectors by:

a <- as.numeric(a)
c <- as.numeric(c)

Now, as discussed, we have index for address b matrix:

n <- length(a)
b_ind <- nrow(b) * (a[2:n] - 1) + a[1:(n-1)]

You also address a[1] element of c as the first element of your final result, so we need concatenate: c[a[1]] and b[b_ind] by:

a <- c(c[a[1]], b[b_ind])
# > a
# [1]  4 16 14  3

This approach is fully vectorized, even better than *apply family.


alexis_laz's remark

alexis_laz reminds me that we can use "matrix-index" as well, i.e., we can also address matrix b via:

b[cbind(a[1:(n-1)],a[2:n])]  ## or b[cbind(a[-n], a[-1])]

However, I think using single index is slightly faster, because we need to access the index matrix by row in order to address b, so we pay higher memory latency than using vector index.



来源:https://stackoverflow.com/questions/37956509/mapply-for-better-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!