How to apply function which returns vector to each numpy array element (and get array with higher dimension)

前端 未结 3 836
隐瞒了意图╮
隐瞒了意图╮ 2021-01-25 12:19

Let\'s write it directly in code

Note: I edited mapper (original example use x -> (x, 2 * x, 3 * x) just for example), to generic blackbox function, which cause the trou

3条回答
  •  日久生厌
    2021-01-25 13:06

    np.vectorize with the new signature option can handle this. It doesn't improve the speed, but makes the dimensional bookkeeping easier.

    In [159]: def blackbox_fn(x): #I can't be changed!
         ...:     assert np.array(x).shape == (), "I'm a fussy little function!"
         ...:     return np.array([x, 2*x, 3*x])
         ...: 
    

    The documentation for signature is a bit cryptic. I've worked with it before, so made a good first guess:

    In [161]: f = np.vectorize(blackbox_fn, signature='()->(n)')
    In [162]: f(np.ones((2,2)))
    Out[162]: 
    array([[[ 1.,  2.,  3.],
            [ 1.,  2.,  3.]],
    
           [[ 1.,  2.,  3.],
            [ 1.,  2.,  3.]]])
    

    With your array:

    In [163]: arr2d = np.array(list(range(4)), dtype=np.uint8).reshape(2, 2)
    In [164]: f(arr2d)
    Out[164]: 
    array([[[0, 0, 0],
            [1, 2, 3]],
    
           [[2, 4, 6],
            [3, 6, 9]]])
    In [165]: _.dtype
    Out[165]: dtype('int32')
    

    The dtype is not preserved, because your blackbox_fn doesn't preserve it. As a default vectorize makes a test calculation with the first element, and uses its dtype to determine the result's dtype. It is possible to specify return dtype with the otypes parameter.

    It can handle arrays other than 2d:

    In [166]: f(np.arange(3))
    Out[166]: 
    array([[0, 0, 0],
           [1, 2, 3],
           [2, 4, 6]])
    In [167]: f(3)
    Out[167]: array([3, 6, 9])
    

    With a signature vectorize is using a Python level iteration. Without a signature it uses np.frompyfunc, with a bit better performance. But as long as blackbox_fn has to be called for element of the input, we can't improve the speed by much (at most 2x).


    np.frompyfunc returns a object dtype array:

    In [168]: fpy = np.frompyfunc(blackbox_fn, 1,1)
    In [169]: fpy(1)
    Out[169]: array([1, 2, 3])
    In [170]: fpy(np.arange(3))
    Out[170]: array([array([0, 0, 0]), array([1, 2, 3]), array([2, 4, 6])], dtype=object)
    In [171]: np.stack(_)
    Out[171]: 
    array([[0, 0, 0],
           [1, 2, 3],
           [2, 4, 6]])
    In [172]: fpy(arr2d)
    Out[172]: 
    array([[array([0, 0, 0]), array([1, 2, 3])],
           [array([2, 4, 6]), array([3, 6, 9])]], dtype=object)
    

    stack can't remove the array nesting in this 2d case:

    In [173]: np.stack(_)
    Out[173]: 
    array([[array([0, 0, 0]), array([1, 2, 3])],
           [array([2, 4, 6]), array([3, 6, 9])]], dtype=object)
    

    but we can ravel it, and stack. It needs a reshape:

    In [174]: np.stack(__.ravel())
    Out[174]: 
    array([[0, 0, 0],
           [1, 2, 3],
           [2, 4, 6],
           [3, 6, 9]])
    

    Speed tests:

    In [175]: timeit f(np.arange(1000))
    14.7 ms ± 322 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [176]: timeit fpy(np.arange(1000))
    4.57 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [177]: timeit np.stack(fpy(np.arange(1000).ravel()))
    6.71 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [178]: timeit np.array([blackbox_fn(i) for i in np.arange(1000)])
    6.44 ms ± 235 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

    Having your function return a list instead of any array might make reassembling the result easier, and maybe even faster

    def foo(x):
        return [x, 2*x, 3*x]
    

    or playing about with the frompyfunc parameters;

    def foo(x):
        return x, 2*x, 3*x   # return a tuple
    In [204]: np.stack(np.frompyfunc(foo, 1,3)(arr2d),2)
    Out[204]: 
    array([[[0, 0, 0],
            [1, 2, 3]],
    
           [[2, 4, 6],
            [3, 6, 9]]], dtype=object)
    

    10x speed up - I'm surprised:

    In [212]: foo1 = np.frompyfunc(foo, 1,3)
    In [213]: timeit np.stack(foo1(np.arange(1000)),1)
    428 µs ± 17.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

提交回复
热议问题