Numpy Performance - Outer Product of a vector with its transpose

冷暖自知 提交于 2019-12-02 03:07:40

Exploring some alternatives:

In [162]: x=np.arange(100)
In [163]: np.outer(x,x)
Out[163]: 
array([[   0,    0,    0, ...,    0,    0,    0],
       [   0,    1,    2, ...,   97,   98,   99],
       [   0,    2,    4, ...,  194,  196,  198],
       ...,
       [   0,   97,  194, ..., 9409, 9506, 9603],
       [   0,   98,  196, ..., 9506, 9604, 9702],
       [   0,   99,  198, ..., 9603, 9702, 9801]])
In [164]: x1=x[:,None]
In [165]: x1*x1.T
Out[165]: 
array([[   0,    0,    0, ...,    0,    0,    0],
       [   0,    1,    2, ...,   97,   98,   99],
       [   0,    2,    4, ...,  194,  196,  198],
       ...,
       [   0,   97,  194, ..., 9409, 9506, 9603],
       [   0,   98,  196, ..., 9506, 9604, 9702],
       [   0,   99,  198, ..., 9603, 9702, 9801]])
In [166]: np.dot(x1,x1.T)
Out[166]: 
array([[   0,    0,    0, ...,    0,    0,    0],
       [   0,    1,    2, ...,   97,   98,   99],
       [   0,    2,    4, ...,  194,  196,  198],
       ...,
       [   0,   97,  194, ..., 9409, 9506, 9603],
       [   0,   98,  196, ..., 9506, 9604, 9702],
       [   0,   99,  198, ..., 9603, 9702, 9801]])

Comparing their times:

In [167]: timeit np.outer(x,x)
40.8 µs ± 63.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [168]: timeit x1*x1.T
36.3 µs ± 22 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [169]: timeit np.dot(x1,x1.T)
60.7 µs ± 6.86 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Is dot using a transpose short cut? I don't think so, or if it does, it doesn't help in this case. I'm a little surprised that dot is slower.

In [170]: x2=x1.T
In [171]: timeit np.dot(x1,x2)
61.1 µs ± 30 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Another method

In [172]: timeit np.einsum('i,j',x,x)
28.3 µs ± 19.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

einsum with x1 and x2 has the same times.

Interesting that matmul does as well as einsum in this case (maybe einsum is delegating to matmul?)

In [178]: timeit x1@x2
27.3 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [179]: timeit x1@x1.T
27.2 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Numpy efficient matrix self-multiplication (gram matrix) demonstrates how dot can be save time by being clever (for a 1000x1000 array).

As discussed in the links, dot can detect when one argument is the transpose of the other (probably by checking the data buffer pointer and shape and strides), and can use a BLAS function optimized for symmetric calculations. But I don't see evidence of outer doing that. And its unlikely that broadcasted multiplication would take such a step.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!