I would like to calculate K-nearest neighbour in python. what library should i use?
I think that you should use scikit ann.
There is a good tutorial about the nearest neightbour here.
According to the documentation :
ann is a SWIG-generated python wrapper for the Approximate Nearest Neighbor (ANN) Library (http://www.cs.umd.edu/~mount/ANN/), developed by David M. Mount and Sunil Arya. ann provides an immutable kdtree implementation (via ANN) which can perform k-nearest neighbor and approximate k
I wrote a script to compare FLANN and scipy.spatial.cKDTree, couldn't get the ANN wrapper to compile. You can try this out for yourself to see what will work for your application. The cKDTree had a comparable run time for my test case with FLANN, FLANN was ~1.25x faster. When I increased testSize FLANN was ~2x faster than cKDTree. Seems like FLANN would be more difficult to integrate depending on the project since it's not part of a standard python package.
import cProfile
from numpy import random
from pyflann import *
from scipy import spatial
# Config params
dim = 4
knn = 5
dataSize = 1000
testSize = 1
# Generate data
random.seed(1)
dataset = random.rand(dataSize, dim)
testset = random.rand(testSize, dim)
def test1(numIter=1000):
'''Test tree build time.'''
flann = FLANN()
for k in range(numIter):
kdtree = spatial.cKDTree(dataset, leafsize=10)
params = flann.build_index(dataset, target_precision=0.0, log_level = 'info')
def test2(numIter=100):
kdtree = spatial.cKDTree(dataset, leafsize=10)
flann = FLANN()
params = flann.build_index(dataset, target_precision=0.0, log_level = 'info')
for k in range(numIter):
result1 = kdtree.query(testset, 5)
result2 = flann.nn_index(testset, 5, checks=params['checks'])
import cProfile
cProfile.run('test2()', 'out.prof')
scipy.spatial.cKDTree is fast and solid. For an example of using it for NN interpolation, see (ahem) inverse-distance-weighted-idw-interpolation-with-python on SO.
(If you could say e.g. "I have 1M points in 3d, and want k=5 nearest neighbors of 1k new points",
you might get better answers or code examples.
What do you want to do with the neighbors once you've found them ?)
It is natively in scipy if you're looking to do a kd-tree approach: http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html#scipy.spatial.KDTree
来源:https://stackoverflow.com/questions/5565935/k-nearest-neighbour-in-python