Understanding `leafsize` in scipy.spatial.KDTree

自闭症网瘾萝莉.ら 提交于 2020-12-13 03:38:16

问题


Problem statement:

I have 150k points in a 3D space with their coordinates stored in a matrix with dimension [150k, 3] in mm.

I want to find all the neighbors of a given point p that are within a radius r. And I want to do that in the most accurate way.

How should I choose my leafsize parameter ?

from scipy.spatial import KDTree
import numpy as np

pts = np.random.rand(150000,3)

T1 = KDTree(pts, leafsize=20)
T2 = KDTree(pts, leafsize=1)

neighbors1= T1.query_ball_point((0.3,0.2,0.1), r=2.0)
neighbors2= T2.query_ball_point((0.3,0.2,0.1), r=2.0)

np.allclose(sorted(neighbors1), sorted(neighbors2))
True

回答1:


The function query_ball_point will return the correct set of points for any version of the search tree. The leafsize parameter does not impact the results of the query, only the performance of the results.

Imagine two trees shown below for the same data (but different leafsize parameters) and a query searching for all points inside the red circle.

In both cases, the code will only return the two points that lie inside the red circle. This is done by checking all points in all boxes of the tree that intersect the circle. This leads to a different amount of work (i.e., different performance) in each case. For the left tree (corresponding to a larger leafsize), the algorithm has to check if 13 points are inside the circle (6 in the upper intersecting box and 7 in the lower intersecting box). In the right tree (which has a smaller leaf size), only three points get processed (one in the upper intersecting box and two in the lower intersecting box).

Following this logic, you may think it just makes sense to always use a small leaf size: this will minimize the number of actual comparisons at the end of the algorithm (do decide if the points actually lie in the query region). But it isn't that simple: the smaller leaf size will generate a deeper tree adding cost to the construction time and to the tree traversal time. Getting the right balance of tree-traversal performance with the leaf-level comparisons really depends on the type of data going into the tree and the specific leaf-level comparisons you are doing. Which is why scipy provides the leafsize parameter as an argument so you can tune things to perform best on a particular algorithm.



来源:https://stackoverflow.com/questions/65003877/understanding-leafsize-in-scipy-spatial-kdtree

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!