Clustering longitude and latitude gps data

你离开我真会死。 提交于 2019-12-06 09:04:16

You can use pairwise_distances to calculate Geo distance from latitude / longitude and then pass the distance matrix into DBSCAN, by specifying metric='precomputed'.

To calculate the distance matrix:

from sklearn.metrics.pairwise import pairwise_distances
from sklearn.cluster import DBSCAN
from geopy.distance import vincenty

def distance_in_meters(x, y):
    return vincenty((x[0], x[1]), (y[0], y[1])).m

distance_matrix = pairwise_distances(sample, metric=distance_in_meters)

To run DBSCAN using the matrix:

dbscan = DBSCAN(metric='precomputed', eps=3, min_samples=10)
dbscan.fit(distance_matrix)

Hope this helps.

Gengyu

DBSCAN is a reasonable choice, but you may get better results with a hierarchical clustering algorithm such as OPTICS and HDBSCAN*.

I did a blog post some time ago on clustering 23 million Tweet locations:

http://www.vitavonni.de/blog/201410/2014102301-clustering-23-mio-tweet-locations.html

Here is also a blog for clustering GPS points. She uses a very similar approach and gives much more details:

https://doublebyteblog.wordpress.com/

In essence, OPTICS works well for such data, and you really need to use an index such as the R*-tree or Cover tree in ELKI. Both work with Haversine distance and are really fast.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!