问题
I have a numpy
array
import numpy as np
arr = np.arange(20).reshape(2,10)
arr[1,:] = 0
arr[1,2] = arr[1,5] = arr[1,7] = 1
print(arr)
>>>[[0 1 2 3 4 5 6 7 8 9]
>>> [0 0 1 0 0 1 0 1 0 0]]
I want to extract overlapping arrays, starting at a 1
and ending behind the next 1
.
Expected output:
[[0 1 2 3]
[0 0 1 0]]
[[2 3 4 5 6]
[1 0 0 1 0]]
[[5 6 7 8]
[1 0 1 0]]
[[7 8 9]
[1 0 0]]
At the moment, I have an index-based for-loop that feels awkward in a numpy
context and also has to treat the first and last segment as special cases:
arr[1,0] = 1
ind = list(np.where(arr[1,:]))[0]
print(ind)
for i, j in enumerate(ind):
if not i:
continue
curr = np.copy(arr[:, ind[i-1]:j+2])
print(curr)
#last segment
curr = np.copy(arr[:, j:])
print(curr)
This approach gives me the desired output but I cannot believe there is not a numpier way to achieve this (although the tumbleweed reaction here may indicate this). If there is an easier pandas solution, that would also be fine. The output is ideally a list of these arrays or a similar data structure; the output arrays don't have to be returned individually.
回答1:
There is a part of solution, my favorite and not complicated:
split_idx = np.flatnonzero(arr[1]) + 2
>>> np.split(arr, split_idx, axis=1)
[array([[0, 1, 2, 3],
[0, 0, 1, 0]]),
array([[4, 5, 6],
[0, 1, 0]]),
array([[7, 8],
[1, 0]]),
array([[9],
[0]])]
But there are two things that indicates a bad design of any numpyic
approach for this problem:
- You're forced to work with lists of distinct shapes which is not designed for
numpy
. Sonp.split
is quite slow. - You can't loop an array in one go. Extra insertions are needed at the beginnings of interior items.
来源:https://stackoverflow.com/questions/64784980/numpy-array-segmentation