Numpy array segmentation

问题

I have a numpy array

import numpy as np

arr = np.arange(20).reshape(2,10)
arr[1,:] = 0
arr[1,2] = arr[1,5] = arr[1,7] = 1
print(arr)
>>>[[0 1 2 3 4 5 6 7 8 9]
>>> [0 0 1 0 0 1 0 1 0 0]]

I want to extract overlapping arrays, starting at a 1 and ending behind the next 1. Expected output:

[[0 1 2 3]
 [0 0 1 0]]

[[2 3 4 5 6]
 [1 0 0 1 0]]

[[5 6 7 8]
 [1 0 1 0]]

[[7 8 9]
 [1 0 0]]

At the moment, I have an index-based for-loop that feels awkward in a numpy context and also has to treat the first and last segment as special cases:

arr[1,0] = 1
ind = list(np.where(arr[1,:]))[0]
print(ind)

for i, j in enumerate(ind):
    if not i:
        continue
    curr = np.copy(arr[:, ind[i-1]:j+2])
    print(curr) 
        
#last segment
curr = np.copy(arr[:, j:])
print(curr)

This approach gives me the desired output but I cannot believe there is not a numpier way to achieve this (although the tumbleweed reaction here may indicate this). If there is an easier pandas solution, that would also be fine. The output is ideally a list of these arrays or a similar data structure; the output arrays don't have to be returned individually.

回答1:

There is a part of solution, my favorite and not complicated:

split_idx = np.flatnonzero(arr[1]) + 2
>>> np.split(arr, split_idx, axis=1)
[array([[0, 1, 2, 3],
        [0, 0, 1, 0]]),
 array([[4, 5, 6],
        [0, 1, 0]]),
 array([[7, 8],
        [1, 0]]),
 array([[9],
        [0]])]

But there are two things that indicates a bad design of any numpyic approach for this problem:

You're forced to work with lists of distinct shapes which is not designed for numpy. So np.split is quite slow.
You can't loop an array in one go. Extra insertions are needed at the beginnings of interior items.

来源：https://stackoverflow.com/questions/64784980/numpy-array-segmentation

标签

python

arrays

pandas

numpy