Filling gaps in a numpy array

烂漫一生 提交于 2019-11-27 19:31:15

You can set up a crystal-growth-style algorithm shifting a view alternately along each axis, replacing only data that is flagged with a False but has a True neighbor. This gives a "nearest-neighbor"-like result (but not in Euclidean or Manhattan distance -- I think it might be nearest-neighbor if you are counting pixels, counting all connecting pixels with common corners) This should be fairly efficient with NumPy as it iterates over only axis and convergence iterations, not small slices of the data.

Crude, fast and stable. I think that's what you were after:

import numpy as np
# -- setup --
shape = (10,10,10)
dim = len(shape)
data = np.random.random(shape)
flag = np.zeros(shape, dtype=bool)
t_ct = int(data.size/5)
flag.flat[np.random.randint(0, flag.size, t_ct)] = True
# True flags the data
# -- end setup --

slcs = [slice(None)]*dim

while np.any(~flag): # as long as there are any False's in flag
    for i in range(dim): # do each axis
        # make slices to shift view one element along the axis
        slcs1 = slcs[:]
        slcs2 = slcs[:]
        slcs1[i] = slice(0, -1)
        slcs2[i] = slice(1, None)

        # replace from the right
        repmask = np.logical_and(~flag[slcs1], flag[slcs2])
        data[slcs1][repmask] = data[slcs2][repmask]
        flag[slcs1][repmask] = True

        # replace from the left
        repmask = np.logical_and(~flag[slcs2], flag[slcs1])
        data[slcs2][repmask] = data[slcs1][repmask]
        flag[slcs2][repmask] = True

For good measure, here's a visualization (2D) of the zones seeded by the data originally flagged True.

Using scipy.ndimage, your problem can be solved with nearest neighbor interpolation in 2 lines :

from scipy import ndimage as nd

indices = nd.distance_transform_edt(invalid_cell_mask, return_distances=False, return_indices=True)
data = data[tuple(ind)]

Now, in the form of a function:

import numpy as np
from scipy import ndimage as nd

def fill(data, invalid=None):
    """
    Replace the value of invalid 'data' cells (indicated by 'invalid') 
    by the value of the nearest valid data cell

    Input:
        data:    numpy array of any dimension
        invalid: a binary array of same shape as 'data'. 
                 data value are replaced where invalid is True
                 If None (default), use: invalid  = np.isnan(data)

    Output: 
        Return a filled array. 
    """    
    if invalid is None: invalid = np.isnan(data)

    ind = nd.distance_transform_edt(invalid, 
                                    return_distances=False, 
                                    return_indices=True)
    return data[tuple(ind)]

Exemple of use:

def test_fill(s,d):
     # s is size of one dimension, d is the number of dimension
    data = np.arange(s**d).reshape((s,)*d)
    seed = np.zeros(data.shape,dtype=bool)
    seed.flat[np.random.randint(0,seed.size,int(data.size/20**d))] = True

    return fill(data,-seed), seed

import matplotlib.pyplot as plt
data,seed  = test_fill(500,2)
data[nd.binary_dilation(seed,iterations=2)] = 0   # draw (dilated) seeds in black
plt.imshow(np.mod(data,42))                       # show cluster

result:

Some time ago I wrote this script for my PhD: https://github.com/Technariumas/Inpainting

An example: http://blog.technariumas.lt/post/117630308826/healing-holes-in-python-arrays

Slow, but does the work. Gaussian kernel is the best choice, just check size/sigma values.

You may try to tackle your problem like:

# main ideas described in very high level pseudo code
choose suitable base kernel shape and type (gaussian?)
while true
    loop over your array (moving average manner)
        adapt your base kernel to current sparsity pattern
        set current value based on adapted kernel
    break if converged

This actually can be implemented quite a straightforward manner (especially if performance is not a top concern).

Obviously this is just heuristics and you need to do some experiments with your actual data to find proper adaptation scheme. When seeing kernel adaptation as kernel reweighing, you may like to do it based on how the values have been propagated. For example your weights for original supports are 1 and they decay related on which iteration they emerged.

Also the determination of when this process has actually converged may be tricky one. Depending on the application it may be reasonable eventually to leave some 'gap regions' remain 'unfilled'.

Update: Here is a very simple implementation along the lines *) described above:

from numpy import any, asarray as asa, isnan, NaN, ones, seterr
from numpy.lib.stride_tricks import as_strided as ast
from scipy.stats import nanmean

def _a2t(a):
    """Array to tuple."""
    return tuple(a.tolist())

def _view(D, shape, strides):
    """View of flattened neighbourhood of D."""
    V= ast(D, shape= shape, strides= strides)
    return V.reshape(V.shape[:len(D.shape)]+ (-1,))

def filler(A, n_shape, n_iter= 49):
    """Fill in NaNs from mean calculated from neighbour."""
    # boundary conditions
    D= NaN* ones(_a2t(asa(A.shape)+ asa(n_shape)- 1), dtype= A.dtype)
    slc= tuple([slice(n/ 2, -(n/ 2)) for n in n_shape])
    D[slc]= A

    # neighbourhood
    shape= _a2t(asa(D.shape)- asa(n_shape)+ 1)+ n_shape
    strides= D.strides* 2

    # iterate until no NaNs, but not more than n iterations
    old= seterr(invalid= 'ignore')
    for k in xrange(n_iter):
        M= isnan(D[slc])
        if not any(M): break
        D[slc][M]= nanmean(_view(D, shape, strides), -1)[M]
    seterr(**old)
    A[:]= D[slc]

And a simple demonstration of the filler(.) on action, would be something like:

In []: x= ones((3, 6, 99))
In []: x.sum(-1)
Out[]:
array([[ 99.,  99.,  99.,  99.,  99.,  99.],
       [ 99.,  99.,  99.,  99.,  99.,  99.],
       [ 99.,  99.,  99.,  99.,  99.,  99.]])
In []: x= NaN* x
In []: x[1, 2, 3]= 1
In []: x.sum(-1)
Out[]:
array([[ nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan]])
In []: filler(x, (3, 3, 5))
In []: x.sum(-1)
Out[]:
array([[ 99.,  99.,  99.,  99.,  99.,  99.],
       [ 99.,  99.,  99.,  99.,  99.,  99.],
       [ 99.,  99.,  99.,  99.,  99.,  99.]])

*) So here the nanmean(.) is just used to demonstrate the idea of the adaptation process. Based on this demonstration, it should be quite straightforward to implement a more complex adaptation and decaying weighing scheme. Also note that, no attention is paid to actual execution performance, but it still should be good (with reasonable input shapes).

Maybe what you are looking for is a machine learning algorithm, like a neural network or a support vector machine.

You may check this page, which has some links to SVM packages for python: http://web.media.mit.edu/~stefie10/technical/pythonml.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!