Force numpy to create array of objects

后端 未结 4 1604
终归单人心
终归单人心 2020-11-30 13:08

I have an array:

x = np.array([[1, 2, 3], [4, 5, 6]])

and I want to create another array of shape=(1, 1) and dtype=np.ob

4条回答
  •  死守一世寂寞
    2020-11-30 13:45

    @PaulPanzer's use of np.frompyfunc is clever, but all that reshaping and use of __getitem__ makes it hard to understand:

    Separating the function creation from application might help:

    func = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)
    newarr = func(range(np.prod(osh))).reshape(osh)
    

    This highlights the separation between the ish dimensions and the osh ones.

    I also suspect a lambda function could substitute for the __getitem__.

    This works because frompyfunc returns an object dtype array. np.vectorize also uses frompyfunc but lets us specify a different otype. But both pass a scalar to the function, which is why Paul's approach uses a flattened range and getitem. np.vectorize with a signature lets us pass an array to the function, but it uses a ndindex iteration instead of frompyfunc.

    Inspired by that, here's a np.empty plus fill method - but with ndindex as the iterator:

    In [385]: >>> osh, ish = (2, 3), (2, 5)
         ...: >>> tsh = (*osh, *ish)
         ...: >>> data = np.arange(np.prod(tsh)).reshape(tsh)
         ...: >>> ish = np.shape(data)[len(osh):]
         ...: 
    In [386]: tsh
    Out[386]: (2, 3, 2, 5)
    In [387]: ish
    Out[387]: (2, 5)
    In [388]: osh
    Out[388]: (2, 3)
    In [389]: res = np.empty(osh, object)
    In [390]: for idx in np.ndindex(osh):
         ...:     res[idx] = data[idx]
         ...:     
    In [391]: res
    Out[391]: 
    array([[array([[0, 1, 2, 3, 4],
           [5, 6, 7, 8, 9]]),
           ....
           [55, 56, 57, 58, 59]])]], dtype=object)
    

    For the second example:

    In [399]: arr = np.array(data)
    In [400]: arr.shape
    Out[400]: (2, 2, 2, 3)
    In [401]: res = np.empty(osh, object)
    In [402]: for idx in np.ndindex(osh):
         ...:     res[idx] = arr[idx]
    

    In the third case, np.array(data) already creates the desired (2,2) object dtype array. This res create and fill still works, even though it produces the same thing.

    Speed isn't very different (though this example is small)

    In [415]: timeit data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__get
         ...: item__, 1, 1)(range(np.prod(osh))).reshape(osh)
    49.8 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    In [416]: %%timeit
         ...: arr = np.array(data)
         ...: res = np.empty(osh, object)
         ...: for idx in np.ndindex(osh): res[idx] = arr[idx]
         ...: 
    54.7 µs ± 68.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    Note that when data is a (nested) list, np.reshape(data, (-1, *ish) is , effectively, np.array(data).reshape(-1 *ish). That list has to be first turned into an array.

    Besides speed, it would interesting to see whether one approach is more general than the other. Are there cases that one handles, but the other can't?

提交回复
热议问题