Comparing Python, Numpy, Numba and C++ for matrix multiplication

前端 未结 4 1232
北荒
北荒 2020-12-15 19:18

In a program I am working on, I need to multiply two matrices repeatedly. Because of the size of one of the matrices, this operation takes some time and I wanted to see whic

4条回答
  •  忘掉有多难
    2020-12-15 19:57

    You can still optimize these loops by improving the memory acces, your function could look like (assuming the matrizes are 1000x1000):

    CS = 10
    NCHUNKS = 100
    
    def dot_chunked(A,B):
        C = np.zeros(1000,1000)
    
        for i in range(NCHUNKS):
            for j in range(NCHUNKS):
                for k in range(NCHUNKS):
                    for ii in range(i*CS,(i+1)*CS):
                        for jj in range(j*CS,(j+1)*CS):
                            for kk in range(k*CS,(k+1)*CS):
                                C[ii,jj] += A[ii,kk]*B[kk,jj] 
        return C
    

    Explanation: the loops i and ii obviously together perform the same way as i did before, the same hold for j and k, but this time regions in A and B of size CSxCS can be kept in the cache (I guess) and can used more then once.

    You can play around with CS and NCHUNKS. For me CS=10 and NCHUNKS=100 worked well. When using numba.jit, it accelerates the code from 7s to 850 ms (notice i use 1000x1000, the graphics above are run with 3x3x10^5, so its a bit of another scenario).

提交回复
热议问题