Fast linear interpolation in Numpy / Scipy “along a path”

前端 未结 3 1404
闹比i
闹比i 2021-02-01 18:05

Let\'s say that I have data from weather stations at 3 (known) altitudes on a mountain. Specifically, each station records a temperature measurement at its location every minut

3条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-02-01 18:45

    A linear interpolation between two values y1, y2 at locations x1 and x2, with respect to point xi is simply:

    yi = y1 + (y2-y1) * (xi-x1) / (x2-x1)
    

    With some vectorized Numpy expressions we can select the relevant points from the dataset and apply the above function:

    I = np.searchsorted(altitudes, location)
    
    x1 = altitudes[I-1]
    x2 = altitudes[I]
    
    time = np.arange(len(alltemps))
    y1 = alltemps[time,I-1]
    y2 = alltemps[time,I]
    
    xI = location
    
    yI = y1 + (y2-y1) * (xI-x1) / (x2-x1)
    

    The trouble is that some points lie on the boundaries of (or even outside of) the known range, which should be taken into account:

    I = np.searchsorted(altitudes, location)
    same = (location == altitudes.take(I, mode='clip'))
    out_of_range = ~same & ((I == 0) | (I == altitudes.size))
    I[out_of_range] = 1  # Prevent index-errors
    
    x1 = altitudes[I-1]
    x2 = altitudes[I]
    
    time = np.arange(len(alltemps))
    y1 = alltemps[time,I-1]
    y2 = alltemps[time,I]
    
    xI = location
    
    yI = y1 + (y2-y1) * (xI-x1) / (x2-x1)
    yI[out_of_range] = np.nan
    

    Luckily, Scipy already provides ND interpolation, which also just as easy takes care of the mismatching times, for example:

    from scipy.interpolate import interpn
    
    time = np.arange(len(alltemps))
    
    M = 150
    hiketime = np.linspace(time[0], time[-1], M)
    location = np.linspace(altitudes[0], altitudes[-1], M)
    xI = np.column_stack((hiketime, location))
    
    yI = interpn((time, altitudes), alltemps, xI)
    

    Here's a benchmark code (without any pandas actually, bit I did include the solution from the other answer):

    import numpy as np
    from scipy.interpolate import interp1d, interpn
    
    def original():
        return np.array([interp1d(altitudes, alltemps[i, :])(loc)
                                    for i, loc in enumerate(location)])
    
    def OP_self_answer():
        return np.diagonal(interp1d(altitudes, alltemps)(location))
    
    def interp_checked():
        I = np.searchsorted(altitudes, location)
        same = (location == altitudes.take(I, mode='clip'))
        out_of_range = ~same & ((I == 0) | (I == altitudes.size))
        I[out_of_range] = 1  # Prevent index-errors
    
        x1 = altitudes[I-1]
        x2 = altitudes[I]
    
        time = np.arange(len(alltemps))
        y1 = alltemps[time,I-1]
        y2 = alltemps[time,I]
    
        xI = location
    
        yI = y1 + (y2-y1) * (xI-x1) / (x2-x1)
        yI[out_of_range] = np.nan
    
        return yI
    
    def scipy_interpn():
        time = np.arange(len(alltemps))
        xI = np.column_stack((time, location))
        yI = interpn((time, altitudes), alltemps, xI)
        return yI
    
    N, sigma = 1000., 5
    
    basetemps = 70 + (np.random.randn(N) * sigma)
    midtemps = 50 + (np.random.randn(N) * sigma)
    toptemps = 40 + (np.random.randn(N) * sigma)
    trend = np.sin(4 / N * np.arange(N)) * 30
    trend = trend[:, np.newaxis]
    alltemps = np.array([basetemps, midtemps, toptemps]).T + trend
    altitudes = np.array([500, 1500, 4000], dtype=float)
    location = np.linspace(altitudes[0], altitudes[-1], N)
    
    funcs = [original, interp_checked, scipy_interpn]
    for func in funcs:
        print(func.func_name)
        %timeit func()
    
    from itertools import combinations
    outs = [func() for func in funcs]
    print('Output allclose:')
    print([np.allclose(out1, out2) for out1, out2 in combinations(outs, 2)])
    

    With the following result on my system:

    original
    10 loops, best of 3: 184 ms per loop
    OP_self_answer
    10 loops, best of 3: 89.3 ms per loop
    interp_checked
    1000 loops, best of 3: 224 µs per loop
    scipy_interpn
    1000 loops, best of 3: 1.36 ms per loop
    Output allclose:
    [True, True, True, True, True, True]
    

    Scipy's interpn suffers somewhat in terms of speed compared to the very fastest method, but for it's generality and ease of use it's definitely the way to go.

提交回复
热议问题