Clustering longitude and latitude gps data

房东的猫 提交于 2020-01-02 09:59:28

问题


I have more than 400 thousand cars GPS locations, like:

[ 25.41452217,  37.94879532],
[ 25.33231735,  37.93455887],
[ 25.44327736,  37.96868896],
... 

I need to make spatial clustering with the distance between points <= 3 meters.
I tried to use DBSCAN, but it seems that it is not working for geo(longitude, latitude).

Also, I do not know the number of clusters.


回答1:


You can use pairwise_distances to calculate Geo distance from latitude / longitude and then pass the distance matrix into DBSCAN, by specifying metric='precomputed'.

To calculate the distance matrix:

from sklearn.metrics.pairwise import pairwise_distances
from sklearn.cluster import DBSCAN
from geopy.distance import vincenty

def distance_in_meters(x, y):
    return vincenty((x[0], x[1]), (y[0], y[1])).m

distance_matrix = pairwise_distances(sample, metric=distance_in_meters)

To run DBSCAN using the matrix:

dbscan = DBSCAN(metric='precomputed', eps=3, min_samples=10)
dbscan.fit(distance_matrix)

Hope this helps.

Gengyu




回答2:


DBSCAN is a reasonable choice, but you may get better results with a hierarchical clustering algorithm such as OPTICS and HDBSCAN*.

I did a blog post some time ago on clustering 23 million Tweet locations:

http://www.vitavonni.de/blog/201410/2014102301-clustering-23-mio-tweet-locations.html

Here is also a blog for clustering GPS points. She uses a very similar approach and gives much more details:

https://doublebyteblog.wordpress.com/

In essence, OPTICS works well for such data, and you really need to use an index such as the R*-tree or Cover tree in ELKI. Both work with Haversine distance and are really fast.



来源:https://stackoverflow.com/questions/36816084/clustering-longitude-and-latitude-gps-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!