What is the best drop-in replacement for numpy.interp if I want the null interpolation (piecewise constant)?

numpy.interp is very convenient and relatively fast. In certain contexts I'd like to compare its output against a non-interpolated variant where the sparse values are propagated (in the "denser" output) and the result is piecewise constant between the sparse inputs. The function I want could also be called a "sparse -> dense" converter that copies the latest sparse value until it finds a later value (a kind of null interpolation as if zero time/distance has ever elapsed from the earlier value).

Unfortunately, it's not easy to tweak the source for numpy.interp because it's just a wrapper around a compiled function. I can write this myself using Python loops, but hope to find a C-speed way to solve the problem.

Update: the solution below (scipy.interpolate.interp1d with kind='zero') is quite slow and takes more than 10 seconds per call (e.g. input 500k in length that's 50% populated). It implements kind='zero' using a zero-order spline and the call to spleval is very slow. However, the source code for kind='linear' (i.e. default interpolation) gives an excellent template for solving the problem using straight numpy (minimal change is to set slope=0). That code shows how to use numpy.searchsorted to solve the problem and the runtime is similar to calling numpy.interp, so problem is solved by tweaking the scipy.interpolate.interp1d implementation of linear interpolation to just skip the interpolation step (slope != 0 blends the adjacent values).

The scipy.interpolate.interp1d can do all kinds of interpolation: ‘linear’,’nearest’, ‘zero’, ‘slinear’, ‘quadratic, ‘cubic’.

Please check the document: http://docs.scipy.org/doc/scipy-0.10.1/reference/generated/scipy.interpolate.interp1d.html#scipy.interpolate.interp1d

Just for completion: The solution to the question is the following code which I was able to write with the help of the hints given in the updated answer:

def interpolate_constant(x, xp, yp):
    indices = np.searchsorted(xp, x, side='right')
    y = np.concatenate(([0], yp))
    return y[indices]

I totally agree that kind='zero' is extremely slow; for a large data set of million rows it can take literally 1000 times slower than 'linear' method. For "left-constant" interpolation - using the latest value - the following code works:

def approx(x, y, xout, yleft=np.nan, yright=np.nan): 
    xoutIdx     = np.searchsorted(x, xout, side='right')-1
    return (np.where(xout<x[0], yleft, np.where(xout>x[-1], yright, y[xoutIdx])))

Coming from R background, this is equivalent to R's approx with f=0. I haven't found a clean way to do this for "right-constant" interpolation because python's np.searchsorted with side='right' pushes one index back if an xout value matches exactly with a value in x...

来源：https://stackoverflow.com/questions/12240634/what-is-the-best-drop-in-replacement-for-numpy-interp-if-i-want-the-null-interpo

标签

python

numpy

interpolation