Numpy quirk: Apply function to all pairs of two 1D arrays, to get one 2D array

前端 未结 6 973
你的背包
你的背包 2020-12-13 02:24

Let\'s say I have 2 one-dimensional (1D) numpy arrays, a and b, with lengths n1 and n2 respectively. I also have a functi

相关标签:
6条回答
  • 2020-12-13 02:41

    You could use list comprehensions to create an array of arrays:

    import numpy as np
    
    # Arrays
    a = np.array([1, 2, 3]) # n1 = 3
    b = np.array([4, 5]) # n2 = 2
    
    # Your function (just an example)
    def f(i, j):
        return i + j
    
    result = np.array([[f(i, j)for j in b ]for i in a])
    print result
    

    Output:

    [[5 6]
     [6 7]
     [7 8]]
    
    0 讨论(0)
  • 2020-12-13 02:43

    You can use numpy broadcasting to do calculation on the two arrays, turning a into a vertical 2D array using newaxis:

    In [11]: a = np.array([1, 2, 3]) # n1 = 3
        ...: b = np.array([4, 5]) # n2 = 2
        ...: #if function is c(i, j) = a(i) + b(j)*2:
        ...: c = a[:, None] + b*2
    
    In [12]: c
    Out[12]: 
    array([[ 9, 11],
           [10, 12],
           [11, 13]])
    

    To benchmark:

    In [28]: a = arange(100)
    
    In [29]: b = arange(222)
    
    In [30]: timeit r = np.array([[f(i, j) for j in b] for i in a])
    10 loops, best of 3: 29.9 ms per loop
    
    In [31]: timeit c = a[:, None] + b*2
    10000 loops, best of 3: 71.6 us per loop
    
    0 讨论(0)
  • 2020-12-13 02:47

    If F() works with broadcast arguments, definitely use that, as others describe.
    An alternative is to use np.fromfunction (function_on_an_int_grid would be a better name.) The following just maps the int grid to your a-b grid, then into F():

    import numpy as np
    
    def func_allpairs( F, a, b ):
        """ -> array len(a) x len(b):
            [[ F( a0 b0 )  F( a0 b1 ) ... ]
             [ F( a1 b0 )  F( a1 b1 ) ... ]
             ...
            ]
        """
        def fab( i, j ):
            return F( a[i], b[j] )  # F scalar or vec, e.g. gradient
    
        return np.fromfunction( fab, (len(a), len(b)), dtype=int )  # -> fab( all pairs )
    
    
    #...............................................................................
    def F( x, y ):
        return x + 10*y
    
    a = np.arange( 100 )
    b = np.arange( 222 )
    A = func_allpairs( F, a, b )
    # %timeit: 1000 loops, best of 3: 241 µs per loop -- imac i5, np 1.9.3
    
    0 讨论(0)
  • 2020-12-13 02:49

    As another alternative that's a bit more extensible than the dot-product, in less than 1/5th - 1/9th the time of nested list comprehensions, use numpy.newaxis (took a bit more digging to find):

    >>> import numpy
    >>> a = numpy.array([0,1,2])
    >>> b = numpy.array([0,1,2,3])
    

    This time, using the power function:

    >>> pow(a[:,numpy.newaxis], b)
    array([[1, 0, 0, 0],
           [1, 1, 1, 1],
           [1, 2, 4, 8]])
    

    Compared with an alternative:

    >>> numpy.array([[pow(i,j) for j in b] for i in a])
    array([[1, 0, 0, 0],
           [1, 1, 1, 1],
           [1, 2, 4, 8]])
    

    And comparing the timing:

    >>> import timeit
    >>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
    31.943181037902832
    >>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
    5.985810041427612
    
    >>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
    109.74687385559082
    >>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
    11.989138126373291
    
    0 讨论(0)
  • 2020-12-13 03:02

    May I suggest, if your use-case is more limited to products, that you use the outer-product?

    e.g.:

    import numpy
    
    a = array([0, 1, 2])
    b = array([0, 1, 2, 3])
    
    numpy.outer(a,b)
    

    returns

    array([[0, 0, 0, 0],
           [0, 1, 2, 3],
           [0, 2, 4, 6]])
    

    You can then apply other transformations:

    numpy.outer(a,b) + 1
    

    returns

    array([[1, 1, 1, 1],
           [1, 2, 3, 4],
           [1, 3, 5, 7]])
    

    This is much faster:

    >>> import timeit
    >>> timeit.timeit('numpy.array([[i*j for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
    31.79583477973938
    
    >>> timeit.timeit('numpy.outer(a,b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
    9.351550102233887
    >>> timeit.timeit('numpy.outer(a,b)+1', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
    12.308301210403442
    
    0 讨论(0)
  • 2020-12-13 03:08

    If F is beyond your control, you can wrap it automatically to be "vector-aware" by using numpy.vectorize. I present a working example below where I define my own F just for completeness. This approach has the simplicity advantage, but if you have control over F, rewriting it with a bit of care to vectorize correctly can have huge speed benefits

    import numpy
    
    n1 = 100
    n2 = 200
    
    a = numpy.arange(n1)
    b = numpy.arange(n2)
    
    def F(x, y):
        return x + y
    
    # Everything above this is setup, the answer to your question lies here:
    fv = numpy.vectorize(F)
    r = fv(a[:, numpy.newaxis], b)
    

    On my computer, the following timings are found, showing the price you pay for "automatic" vectorisation:

    %timeit fv(a[:, numpy.newaxis], b)
    100 loops, best of 3: 3.58 ms per loop
    
    %timeit F(a[:, numpy.newaxis], b)
    10000 loops, best of 3: 38.3 µs per loop
    
    0 讨论(0)
提交回复
热议问题