I have an array:
x = np.array([[1, 2, 3], [4, 5, 6]])
and I want to create another array of shape=(1, 1) and dtype=np.ob
@PaulPanzer's use of np.frompyfunc is clever, but all that reshaping and use of __getitem__ makes it hard to understand:
Separating the function creation from application might help:
func = np.frompyfunc(np.reshape(data, (-1, *ish)).__getitem__, 1, 1)
newarr = func(range(np.prod(osh))).reshape(osh)
This highlights the separation between the ish dimensions and the osh ones.
I also suspect a lambda function could substitute for the __getitem__.
This works because frompyfunc returns an object dtype array. np.vectorize also uses frompyfunc but lets us specify a different otype. But both pass a scalar to the function, which is why Paul's approach uses a flattened range and getitem. np.vectorize with a signature lets us pass an array to the function, but it uses a ndindex iteration instead of frompyfunc.
Inspired by that, here's a np.empty plus fill method - but with ndindex as the iterator:
In [385]: >>> osh, ish = (2, 3), (2, 5)
...: >>> tsh = (*osh, *ish)
...: >>> data = np.arange(np.prod(tsh)).reshape(tsh)
...: >>> ish = np.shape(data)[len(osh):]
...:
In [386]: tsh
Out[386]: (2, 3, 2, 5)
In [387]: ish
Out[387]: (2, 5)
In [388]: osh
Out[388]: (2, 3)
In [389]: res = np.empty(osh, object)
In [390]: for idx in np.ndindex(osh):
...: res[idx] = data[idx]
...:
In [391]: res
Out[391]:
array([[array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]),
....
[55, 56, 57, 58, 59]])]], dtype=object)
For the second example:
In [399]: arr = np.array(data)
In [400]: arr.shape
Out[400]: (2, 2, 2, 3)
In [401]: res = np.empty(osh, object)
In [402]: for idx in np.ndindex(osh):
...: res[idx] = arr[idx]
In the third case, np.array(data) already creates the desired (2,2) object dtype array. This res create and fill still works, even though it produces the same thing.
Speed isn't very different (though this example is small)
In [415]: timeit data_blocked = np.frompyfunc(np.reshape(data, (-1, *ish)).__get
...: item__, 1, 1)(range(np.prod(osh))).reshape(osh)
49.8 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [416]: %%timeit
...: arr = np.array(data)
...: res = np.empty(osh, object)
...: for idx in np.ndindex(osh): res[idx] = arr[idx]
...:
54.7 µs ± 68.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note that when data is a (nested) list, np.reshape(data, (-1, *ish) is , effectively, np.array(data).reshape(-1 *ish). That list has to be first turned into an array.
Besides speed, it would interesting to see whether one approach is more general than the other. Are there cases that one handles, but the other can't?