Comparing Python, Numpy, Numba and C++ for matrix multiplication

前端未结

关注

 4  1232

北荒 2020-12-15 19:18

In a program I am working on, I need to multiply two matrices repeatedly. Because of the size of one of the matrices, this operation takes some time and I wanted to see whic

4条回答

忘掉有多难 (楼主)

2020-12-15 19:57
You can still optimize these loops by improving the memory acces, your function could look like (assuming the matrizes are 1000x1000):
```
CS = 10
NCHUNKS = 100

def dot_chunked(A,B):
    C = np.zeros(1000,1000)

    for i in range(NCHUNKS):
        for j in range(NCHUNKS):
            for k in range(NCHUNKS):
                for ii in range(i*CS,(i+1)*CS):
                    for jj in range(j*CS,(j+1)*CS):
                        for kk in range(k*CS,(k+1)*CS):
                            C[ii,jj] += A[ii,kk]*B[kk,jj] 
    return C
```
Explanation: the loops i and ii obviously together perform the same way as i did before, the same hold for j and k, but this time regions in A and B of size CSxCS can be kept in the cache (I guess) and can used more then once.

You can play around with CS and NCHUNKS. For me CS=10 and NCHUNKS=100 worked well. When using numba.jit, it accelerates the code from 7s to 850 ms (notice i use 1000x1000, the graphics above are run with 3x3x10^5, so its a bit of another scenario).
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...