Clustering using a custom distance metric for lat/long pairs

二次信任 提交于 2019-12-04 03:43:30

I seem to have found a work around where I compute a distance matrix using: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html then use it as an argument to DBSCAN(metric='precomputed').fit(distance_matrix)

You can do this with scikit-learn: use the haversine metric with the ball-tree algorithm, and pass radian units into the DBSCAN fit method.

This tutorial demonstrates how to cluster spatial lat-long data with scikit-learn's DBSCAN using the haversine metric to cluster based on accurate geodetic distances between lat-long points:

df = pd.read_csv('gps.csv')
coords = df.as_matrix(columns=['lat', 'lon'])
db = DBSCAN(eps=eps, min_samples=ms, algorithm='ball_tree', metric='haversine').fit(np.radians(coords))

Notice that the coordinates are passed into the .fit() method as radian units, and that the epsilon parameter value must also be in radian units as well.

If you want epsilon to be, say 1.5km, then the epsilon parameter value in radian units would = 1.5/6371.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!