Efficient dot products of large memory-mapped arrays

后端 未结 3 401
轻奢々
轻奢々 2020-12-02 22:41

I\'m working with some rather large, dense numpy float arrays that currently reside on disk in PyTables CArrays. I need to be able to perform efficient dot prod

3条回答
  •  不知归路
    2020-12-02 23:33

    I don't think numpy optimizes dot product for memmap arrays, if you look at the code for matrix multiply, which I got here, you'll see that the function MatrixProduct2 (as currently implemented) computes the values of the result matrix in c memory order:

    op = PyArray_DATA(ret); os = PyArray_DESCR(ret)->elsize;
    axis = PyArray_NDIM(ap1)-1;
    it1 = (PyArrayIterObject *)
        PyArray_IterAllButAxis((PyObject *)ap1, &axis);
    it2 = (PyArrayIterObject *)
        PyArray_IterAllButAxis((PyObject *)ap2, &matchDim);
    NPY_BEGIN_THREADS_DESCR(PyArray_DESCR(ap2));
    while (it1->index < it1->size) {
        while (it2->index < it2->size) {
            dot(it1->dataptr, is1, it2->dataptr, is2, op, l, ret);
            op += os;
            PyArray_ITER_NEXT(it2);
        }
        PyArray_ITER_NEXT(it1);
        PyArray_ITER_RESET(it2);
    }
    

    In the above code, op is the return matrix, dot is the 1d dot product function and it1 and it2 are iterators over the input matrices.

    That being said, it looks like your code might already be doing the right thing. In this case the optimal performance is actually much better than O(n^3/sprt(M)), you can limit your IO to only reading each item of A once from disk, or O(n). Memmap arrays naturally have to do some caching behind the scene and inner loop operates on it2, so if A is in C-order and the memmap cache is big enough, your code might already be working. You can enforce caching of rows of A explicitly by doing something like:

    def my_dot(A, B, C):
    
        for ii in xrange(n):
            A_ii = np.array(A[ii, :])
            C[ii, :] = A_ii.dot(B)
    
        return C
    

提交回复
热议问题