How can I introduce the radio in query radius-BallTree sklearn, radians, km?

强颜欢笑 提交于 2021-02-11 07:46:27

问题


I'm working with latitude and longitude data. I've used BallTree because I have many rows (32000 rows) in the dataset. If I built the tree with haversine distance:

'''' model_BTree = BallTree(np.array(points_sec_rad),metric='haversine') ''''

and I transform the latitude and longitude to radians units, how can I apply query_radius (max_dist_rad) to the points which I would like to locate? I've used 0.150 meters as radius but I'm not sure if I should use an approximation in radians.

''''ind_BTree,dist_BTree = model_BTree.query_radius(np.array(points_loc_rad), r=max_dist_rad, return_distance = True, sort_results=True) ''''

Also, how can I limit the number of neighbors inside the radio? thank you


回答1:


edit: Example with working code and explanation

Best way to visualise what is happening with appying the haversine distance, is by visualise that all great circle distances are measured on a small pingpong sphere.

If you want apply query_radius() on larger spheres, like earth, you need to convert the earthy km/miles back to the unit pingpong sphere. Say you want 100 miles, you need to divide by the earth radius in miles. The output of query_radius() needs to be transformed back to miles/km again by multiplying.

Say we have the following towns and museum data in Pandas:

import pandas as pd
import numpy as np

from sklearn.neighbors import BallTree
towns = pd.DataFrame({
    "name" : ["Merry Hill", "Spring Valley", "Nesconset"],
    "lat" : [36.01, 41.32, 40.84],
    "long" : [-76.7, -89.20, -73.15]
})

museum = pd.DataFrame({
    "name" : ["Motte Historical Car Museum, Menifee", "Crocker Art Museum, Sacramento", "World Chess Hall Of Fame, St.Louis", "National Atomic Testing Museum, Las", "National Air and Space Museum, Washington", "The Metropolitan Museum of Art", "Museum of the American Military Family & Learning Center"],
    "lat" : [33.743511, 38.576942, 38.644302, 36.114269, 38.887806, 40.778965, 35.083359],
    "long" : [-117.165161, -121.504997, -90.261154, -115.148315, -77.019844, -73.962311, -106.381531]
})

Than we need to extract the lat/long pairs as numpy arrays with

places_gps = towns[["lat", "long"]].values
museum_gps = museum[["lat", "long"]].values

Now we can create the ball tree with

places_radians =  np.radians(places_gps)
museum_radians = np.radians(museum_gps)

tree = BallTree(museum_radians, leaf_size=15, metric='haversine')

Again, imagine this little ball is just the size of a pingpong ball. To use them for larger/smaller spheres we need to multiply/divide.

Say I want all museum within 100 miles;

distance_in_miles = 100
earth_radius_in_miles = 3958.8
    
radius = distance_in_miles / earth_radius_in_miles

Now I can apply query_radius(), and remember the returned distances need to be converted back to miles. The distances here are the great circle distance on the unit sphere, our pingpong ball.

is_within, distances = tree.query_radius(places_radians, r=radius, count_only=False, return_distance=True) 

so we

distances_in_miles = distances * earth_radius_in_miles

Lets check the output and we see that distances_in_miles

array([array([], dtype=float64), array([], dtype=float64),
       array([42.68960475])], dtype=object)

Which translate to that 'Nesconset' should be < 100 Miles from 'The Metropolitan Museum of Art', and- that this distance is around 42.689 Miles. Notice indeed only a distance is returned for the last array (Nesconset), and with help of is_within we find the index of the museum within in 5, which is museum.name[5], 'The Metropolitan Museum of Art'.

Depending on the method of checking, it won't be exact 42.689 miles, but a quick check with Google maps confirms it is around that range. The earth is simply not a perfect sphere so there will be errors.

Like my original post, errors are easily made, in forgetting to apply the correction factor, swap lat/long values, or km/meters.



来源:https://stackoverflow.com/questions/63121268/how-can-i-introduce-the-radio-in-query-radius-balltree-sklearn-radians-km

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!