DBSCAN for clustering of geographic location data

后端 未结 5 508
一向
一向 2020-12-12 15:56

I have a dataframe with latitude and longitude pairs.

Here is my dataframe look like.

    order_lat  order_long
0   19.111841   72.910729
1   19.1113         


        
5条回答
  •  我在风中等你
    2020-12-12 16:33

    There are three different things you can do to use DBSCAN with GPS data. The first is that you can use the eps parameter to specify the maximum distance between data points that you will consider to create a cluster, as specified in other answers you need to take into account the scale of the distance metric you are using a pick a value that makes sense. Then you can use the min_samples this can be used as a way to filtering out data points while moving. Last the metric will allow you to use whatever distance you want.

    As an example, in a particular research project I'm working on I want to extract significant locations from a subject's GPS data locations collected from their smartphone. I'm not interested on how the subject navigates through the city and also I'm more comfortable dealing with distances in meters then I can do the next:

    from geopy import distance
    def mydist(p1, p2):
         return distance.great_circle((p1[0],p1[1],100),(p2[0],p2[1],100)).meters
    DBSCAN(eps=50,min_samples=50,n_jobs=-1,metric=mydist)
    

    Here eps as per the DBSCAN documentation "The maximum distance between two samples for one to be considered as in the neighborhood of the other." While min samples is "The number of samples (or total weight) in a neighborhood for a point to be considered as a core point." Basically with eps you control how close data points in a cluster should be, in the example above I selected 100 meters. Min samples is just a way to control for density, in the example above the data was captured at about one sample per second, because I'm not interested in when people are moving around but instead stationary locations I want to make sure I get at least the equivalent of 60 seconds of GPS data from the same location.

    If this still does not make sense take a look at this DBSCAN animation.

提交回复
热议问题