问题
I have a 2D numpy array of ints 0 and greater, where the values represent region labels. For example,
array([[9, 9, 9, 0, 0, 0, 0, 1, 1, 1],
       [9, 9, 9, 9, 0, 7, 1, 1, 1, 1],
       [9, 9, 9, 9, 0, 2, 2, 1, 1, 1],
       [9, 9, 9, 8, 0, 2, 2, 1, 1, 1],
       [9, 9, 9, 8, 0, 2, 2, 2, 1, 1],
       [4, 4, 4, 4, 0, 2, 2, 2, 1, 1],
       [4, 6, 6, 4, 0, 0, 0, 0, 0, 0],
       [4, 6, 6, 4, 0, 0, 0, 0, 0, 0],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5]])
I would like the indices equal to 0 (i.e. zero-regions) to take on the value most-common in their neighborhood. The operation would essentially close the zero-regions. I've tried multiple variations of dilation, erosion, grey-closing, and other morphology operations, but I cannot completely eliminate the zero-regions (without awkwardly blending the other regions). A decent approach could be to define a kernel that convolves only over the zeros, and sets the value with the most common label in the filter area. I'm unsure how to implement this though.
回答1:
One vectorized approach is proposed here. Steps are :
- Get kernel sized 2D sliding windows, leading to 4D array. We can use - skimage's view_as_windowsto get those as view and thus avoid creating any extra memory for this.
- Select the windows which are centered at zeros by indexing into the 4D array. This forces a copy. But assuming number of zeros is a relatively smaller number than the total number of elements in input array, this should be okay. 
- For each of those selected windows, offset each window with a proper offset with the idea of using - np.bincountto perform counting. Thus, use- bincountand get the max count excluding the zeros. The argmax for the max count should be our guy!
Here's the implementation covering those steps -
from skimage.util import view_as_windows as viewW
def fill_zero_regions(a, kernel_size=3):
    hk = kernel_size//2 # half_kernel_size    
    a4D = viewW(a, (kernel_size,kernel_size))
    sliced_a = a[hk:-hk,hk:-hk]
    zeros_mask = sliced_a==0
    zero_neighs = a4D[zeros_mask].reshape(-1,kernel_size**2)
    n = len(zero_neighs) # num_zeros
    scale = zero_neighs.max()+1
    zno = zero_neighs + scale*np.arange(n)[:,None] # zero_neighs_offsetted
    count = np.bincount(zno.ravel(), minlength=n*scale).reshape(n,-1)
    modevals = count[:,1:].argmax(1)+1
    sliced_a[zeros_mask] = modevals
    return a
Sample run -
In [23]: a
Out[23]: 
array([[9, 9, 9, 0, 0, 0, 0, 1, 1, 1],
       [9, 9, 9, 9, 0, 7, 1, 1, 1, 1],
       [9, 9, 9, 9, 0, 2, 2, 1, 1, 1],
       [9, 9, 9, 8, 0, 2, 2, 1, 1, 1],
       [9, 9, 9, 8, 0, 2, 2, 2, 1, 1],
       [4, 4, 4, 4, 0, 2, 2, 2, 1, 1],
       [4, 6, 6, 4, 0, 0, 0, 0, 0, 0],
       [4, 6, 6, 4, 0, 0, 0, 0, 0, 0],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5]])
In [24]: fill_zero_regions(a)
Out[24]: 
array([[9, 9, 9, 0, 0, 0, 0, 1, 1, 1],
       [9, 9, 9, 9, 9, 7, 1, 1, 1, 1],
       [9, 9, 9, 9, 2, 2, 2, 1, 1, 1],
       [9, 9, 9, 8, 2, 2, 2, 1, 1, 1],
       [9, 9, 9, 8, 2, 2, 2, 2, 1, 1],
       [4, 4, 4, 4, 2, 2, 2, 2, 1, 1],
       [4, 6, 6, 4, 4, 2, 2, 2, 1, 0],
       [4, 6, 6, 4, 4, 5, 5, 5, 5, 0],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5]])
As seen, we are not solving for the boundary cases. If needed to do, use a zero-padded array as the input array, something like this : np.pad(a, (k//2,k//2), 'constant'), with k as the kernel size (=3 for the sample).
回答2:
A possible solution based on the convolve idea
from scipy import stats
ar = #Your np array
blank = np.zeros(ar.shape)
#Size to search in for mode values
window_size = 3
for x,y in np.array(np.where(ar == 0)).T:
    window = ar[max(x-window_size,0):x+window_size,max(0,y-window_size):y+window_size]
    oneD = window.flatten()
    #fill blank array with modal value
    blank[x,y] = stats.mode(oneD[oneD != 0])[0]
#fill in the zeros
print ar + blank
Im not sure if avoiding a loop is possible here
回答3:
Here's a working solution using Numba, which I haven't profiled but should be pretty fast:
import numba
@numba.njit
def nn(arr):
    res = arr.copy()
    zeros = np.where(arr == 0)
    for n in range(len(zeros[0])):
        i = zeros[0][n]
        j = zeros[1][n]
        left = max(i-1, 0)
        right = min(i+2, arr.shape[1])
        top = max(j-1, 0)
        bottom = min(j+2, arr.shape[0])
        area = arr[left:right,top:bottom].ravel()
        counts = np.bincount(area[area != 0])
        res[i,j] = np.argmax(counts)
    return res
It produces:
array([[9, 9, 9, 9, 7, 1, 1, 1, 1, 1],
       [9, 9, 9, 9, 9, 7, 1, 1, 1, 1],
       [9, 9, 9, 9, 2, 2, 2, 1, 1, 1],
       [9, 9, 9, 8, 2, 2, 2, 1, 1, 1],
       [9, 9, 9, 8, 2, 2, 2, 2, 1, 1],
       [4, 4, 4, 4, 2, 2, 2, 2, 1, 1],
       [4, 6, 6, 4, 4, 2, 2, 2, 1, 1],
       [4, 6, 6, 4, 4, 5, 5, 5, 5, 5],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5]])
The kernel size is 3x3 here, as defined by subtracting 1 and adding 2 to i and j (adding 2 because Python slices take one-past-the-end, e.g. [0:3] gives you 3 elements).  Boundary conditions are handled by min and max.
Credit for the bincount idea: https://stackoverflow.com/a/6252400/4323
来源:https://stackoverflow.com/questions/46126409/numpy-filter-to-smooth-out-zero-regions