Why is B = numpy.dot(A,x) so much slower looping through doing B[i,:,:] = numpy.dot(A[i,:,:],x) )?

前端 未结 3 1177
深忆病人
深忆病人 2020-12-11 16:33

I\'m getting some efficiency test results that I can\'t explain.

I want to assemble a matrix B whose i-th entries B[i,:,:] = A[i,:,:].dot(x), where each A[i,:,:] is

3条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-11 16:47

    numpy.dot only delegates to a BLAS matrix multiply when the inputs each have dimension at most 2:

    #if defined(HAVE_CBLAS)
        if (PyArray_NDIM(ap1) <= 2 && PyArray_NDIM(ap2) <= 2 &&
                (NPY_DOUBLE == typenum || NPY_CDOUBLE == typenum ||
                 NPY_FLOAT == typenum || NPY_CFLOAT == typenum)) {
            return cblas_matrixproduct(typenum, ap1, ap2, out);
        }
    #endif
    

    When you stick your whole 3-dimensional A array into dot, NumPy takes a slower path, going through an nditer object. It still tries to get some use out of BLAS in the slow path, but the way the slow path is designed, it can only use a vector-vector multiply rather than a matrix-matrix multiply, which doesn't give the BLAS anywhere near as much room to optimize.

提交回复
热议问题