K nearest neighbour in python [closed]

时光毁灭记忆、已成空白 提交于 2019-12-02 16:24:58
Sandro Munda

I think that you should use scikit ann.

There is a good tutorial about the nearest neightbour here.

According to the documentation :

ann is a SWIG-generated python wrapper for the Approximate Nearest Neighbor (ANN) Library (http://www.cs.umd.edu/~mount/ANN/), developed by David M. Mount and Sunil Arya. ann provides an immutable kdtree implementation (via ANN) which can perform k-nearest neighbor and approximate k

I wrote a script to compare FLANN and scipy.spatial.cKDTree, couldn't get the ANN wrapper to compile. You can try this out for yourself to see what will work for your application. The cKDTree had a comparable run time for my test case with FLANN, FLANN was ~1.25x faster. When I increased testSize FLANN was ~2x faster than cKDTree. Seems like FLANN would be more difficult to integrate depending on the project since it's not part of a standard python package.

import cProfile
from numpy import random
from pyflann import *
from scipy import spatial

# Config params
dim = 4
knn = 5
dataSize = 1000
testSize = 1

# Generate data
random.seed(1)
dataset = random.rand(dataSize, dim)
testset = random.rand(testSize, dim)

def test1(numIter=1000):
    '''Test tree build time.'''
    flann = FLANN()
    for k in range(numIter):
        kdtree = spatial.cKDTree(dataset, leafsize=10)
        params = flann.build_index(dataset, target_precision=0.0, log_level = 'info')

def test2(numIter=100):
    kdtree = spatial.cKDTree(dataset, leafsize=10)
    flann = FLANN()
    params = flann.build_index(dataset, target_precision=0.0, log_level = 'info')
    for k in range(numIter):
        result1 = kdtree.query(testset, 5)
        result2 = flann.nn_index(testset, 5, checks=params['checks'])

import cProfile
cProfile.run('test2()', 'out.prof')
denis

scipy.spatial.cKDTree is fast and solid. For an example of using it for NN interpolation, see (ahem) inverse-distance-weighted-idw-interpolation-with-python on SO.

(If you could say e.g. "I have 1M points in 3d, and want k=5 nearest neighbors of 1k new points", you might get better answers or code examples.
What do you want to do with the neighbors once you've found them ?)

It is natively in scipy if you're looking to do a kd-tree approach: http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html#scipy.spatial.KDTree

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!