scipy.interpolate.LinearNDInterpolator hangs indefinitely on large data sets

本秂侑毒 提交于 2019-12-01 04:30:00

问题


I'm interpolating some data in Python to regrid it on a regular mesh such that I can partially integrate it. The data represents a function of a high dimension parameter space (presently 3, to be extended to at least 5) and returns a multi-valued function of observables (presently 2, to be extended to 3 and then potentially dozens).

I'm performing the interpolation via scipy.interpolate.LinearNDInterpolator for lack of any other apparent options (and because I understand griddata just calls it anyway). On a smallish data set (15,000 lines of columned data) it works okay. On larger sets (60,000+), the command appears to run indefinitely. top indicates that iPython is using 100% CPU and the terminal is completely unresponsive, including to C-c. So far I've left it a few hours to no avail and ultimately I'd like to pass several million entries.

I suspect the issue is related to this ticket but that was supposedly patched in SciPy 0.10.0, to which I upgraded yesterday.

My question is basically how do I perform multi-dimensional interpolation on large data sets? Based on what I've tried, there are a few possible places a solution could come from but I haven't had any luck finding them. (My search isn't helped by the fact that several of scipy's subdomains seem to be down...)

  • What's going wrong with LinearNDInterpolator? Or, at least, how can I find out what the issue is and try to circumvent the hanging?
  • Is there a way to reformulate the interpolation so that LinearNDInterpolator will work? Perhaps by chunking up the data prudently to regrid it in parts?
  • Are there other high-dimension interpolators that are better suited to the problem? (I note that most of SciPy's alternatives are limited to <2D parameter space.)
  • Are there other ways to get multi-dimensional data onto a regular user-defined grid? That's all I'm trying to do by interpolating...

回答1:


The problem is most likely that your data set is simply too large, so that computing its Delaunay triangulation does not finish in an reasonable time. Check the time scaling of scipy.spatial.Delaunay using smaller data subsets randomly picked from your full data set, to estimate whether the full data set computation finishes before the universe ends.

If your original data is on a rectangular grid, such as

v[i,j,k,l] = f(x[i], y[j], z[k], u[l])

then using a triangulation-based interpolation is very inefficient. It's better to use tensor-product interpolation, i.e., interpolate each dimension successively by a 1-D interpolation method:

import numpy as np
from scipy.interpolate import interp1d

def interp3(x, y, z, v, xi, yi, zi, method='cubic'):
    """Interpolation on 3-D. x, y, xi, yi should be 1-D
    and z.shape == (len(x), len(y), len(z))"""
    q = (x, y, z)
    qi = (xi, yi, zi)
    for j in range(3):
        v = interp1d(q[j], v, axis=j, kind=method)(qi[j])
    return v

def somefunc(x, y, z):
    return x**2 + y**2 - z**2 + x*y*z

# some input data
x = np.linspace(0, 1, 5)
y = np.linspace(0, 2, 6)
z = np.linspace(0, 3, 7)
v = somefunc(x[:,None,None], y[None,:,None], z[None,None,:])

# interpolate
xi = np.linspace(0, 1, 45)
yi = np.linspace(0, 2, 46)
zi = np.linspace(0, 3, 47)
vi = interp3(x, y, z, v, xi, yi, zi)

import matplotlib.pyplot as plt
plt.subplot(121)
plt.pcolor(xi, yi, vi[:,:,12])
plt.title('interpolated')
plt.subplot(122)
plt.pcolor(xi, yi, somefunc(xi[:,None], yi[None,:], zi[12]))
plt.title('exact')
plt.show()

If your data set is scattered and too large for triangulation-based methods, then you need to switch to a different method. Some options are interpolation methods dealing with a small number of nearest neighbors at once (this information can be retrieved fast with a k-d-tree). Inverse distance weighing is one of these, but it may be one of the worse ones --- there are possible better options (which I don't know without further research).



来源:https://stackoverflow.com/questions/12618971/scipy-interpolate-linearndinterpolator-hangs-indefinitely-on-large-data-sets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!