Is there an “enhanced” numpy/scipy dot method?

前端 未结 3 1232
温柔的废话
温柔的废话 2020-12-24 12:45

Problem

I would like to compute the following using numpy or scipy:

Y = A**T * Q * A

where A is a m x n ma

3条回答
  •  忘掉有多难
    2020-12-24 12:55

    (w/r/t the last sentence of the OP: i am not aware of such a numpy/scipy method but w/r/t the Question in the OP Title (i.e., improving NumPy dot performance) what's below should be of some help. In other words, my answer is directed to improving performance of most of the steps comprising your function for Y).

    First, this should give you a noticeable boost over the vanilla NumPy dot method:

    >>> from scipy.linalg import blas as FB
    >>> vx = FB.dgemm(alpha=1., a=v1, b=v2, trans_b=True)
    

    Note that the two arrays, v1, v2 are both in C_FORTRAN order

    You can access the byte order of a NumPy array through an array's flags attribute like so:

    >>> c = NP.ones((4, 3))
    >>> c.flags
          C_CONTIGUOUS : True          # refers to C-contiguous order
          F_CONTIGUOUS : False         # fortran-contiguous
          OWNDATA : True
          MASKNA : False
          OWNMASKNA : False
          WRITEABLE : True
          ALIGNED : True
          UPDATEIFCOPY : False
    

    to change the order of one of the arrays so both are aligned, just call the NumPy array constructor, pass in the array and set the appropriate order flag to True

    >>> c = NP.array(c, order="F")
    
    >>> c.flags
          C_CONTIGUOUS : False
          F_CONTIGUOUS : True
          OWNDATA : True
          MASKNA : False
          OWNMASKNA : False
          WRITEABLE : True
          ALIGNED : True
          UPDATEIFCOPY : False
    

    You can further optimize by exploiting array-order alignment to reduce excess memory consumption caused by copying the original arrays.

    But why are the arrays copied before being passed to dot?

    The dot product relies on BLAS operations. These operations require arrays stored in C-contiguous order--it's this constraint that causes the arrays to be copied.

    On the other hand, the transpose does not effect a copy, though unfortunately returns the result in Fortran order:

    Therefore, to remove the performance bottleneck, you need to eliminate the predicate array-copying step; to do that just requires passing both arrays to dot in C-contiguous order*.

    So to calculate dot(A.T., A) without making an extra copy:

    >>> import scipy.linalg.blas as FB
    >>> vx = FB.dgemm(alpha=1.0, a=A.T, b=A.T, trans_b=True)
    

    In sum, the expression just above (along with the predicate import statement) can substitute for dot, to supply the same functionality but better performance

    you can bind that expression to a function like so:

    >>> super_dot = lambda v, w: FB.dgemm(alpha=1., a=v.T, b=w.T, trans_b=True)
    

提交回复
热议问题