问题
I am trying to use clustering in R. I am a rookie and havent worked much with R.
I have the geo location points as latitude and longitude values. What I am looking to do is to find out the hotspots using this data.
I am looking to create clusters of 4 or more points that are 600 feet apart.
I want to get the centroids of such clusters and plot them.
The data looks like this:
LATITUDE LONGITUD
32.70132 -85.52518
34.74251 -86.88351
32.55205 -87.34777
32.64144 -85.35430
34.92803 -87.81506
32.38016 -86.29790
32.42127 -87.08690
...
structure(list(LATITUDE = c(32.70132, 34.74251, 32.55205, 32.64144,
34.92803, 32.38016, 32.42127, 32.9095, 33.58092, 32.51617, 33.5726,
33.83251, 34.65639, 34.27694, 33.73851, 33.95132, 31.35445, 34.05263,
33.37959, 30.50248, 32.31561, 32.66919, 31.75039, 33.56986, 33.27091,
33.93598, 32.30964, 31.09773, 32.26711, 33.54263, 34.72014, 34.78548,
30.65705, 31.25939, 31.27647, 30.54322, 31.22416, 33.38549, 33.18338,
31.16811, 32.38368, 32.36253, 31.14464), LONGITUD = c(-85.52518,
-86.88351, -87.34777, -85.3543, -87.81506, -86.2979, -87.0869,
-85.75888, -86.27647, -86.21179, -86.65275, -87.2696, -85.72738,
-87.71489, -86.48934, -86.29693, -88.22943, -87.55328, -85.31454,
-87.79342, -86.88108, -86.26669, -88.04425, -86.44631, -87.74383,
-87.72403, -86.28067, -85.4449, -87.62541, -86.56251, -86.48971,
-85.59656, -88.24491, -86.60828, -86.18112, -88.22778, -85.63784,
-86.03297, -87.55456, -85.37719, -86.38047, -86.21579, -86.86606
)), .Names = c("LATITUDE", "LONGITUD"), class = "data.frame", row.names = c(NA,
-43L))
There are 30,800 entries (geo locations) in the above data frame. I have given a sample above.
I cannot use K means as it creates the no. of clusters specified but that is not the case here. Clusters should consist of 4 or more points that are within a distance of some 600ft.
Just as an initial step, I tried to plot all the latitude and longitude points and have an idea how the visualization looks like. So that I can use it to check if the plot of clusters formed and this plot look alike.
plot(dbfvar[,1], dbfvar[,2], type="l") #dbfvar is the dataframe having above data.
The plot was not satisfactory. It was not as expected.

The main part is to create the clusters and obtain the centroids of them, and visualize the centroids of the clusters formed.
P.S. : I am not confined to using R, I can use python as well. I am looking for a good solution for the above problem before I go ahead and implement it over 7 such files (each of 30,800 geo locations.)
回答1:
Hierarchical clustering is one approach.
First you construct a dendrogram:
dend <- hclust(dist(theData), method="complete")
I am using "complete" linkage here, so that all that the groups are merged by the maximum-distance "rule". This should be useful later if we want to make sure that all of our points in one group are at most a certain distance apart.
I choose the distance of "2" (Because I am not sure how to convert your latitudes and longitudes to feet. You should convert first and then choose 600 instead of 2). Here is the resulting dendrogram with the cutting at height of "2".
plot(dend, hang=-1)
points(c(-100,100), c(2,2), col="red", type="l", lty=2)

Now each subtree intersected by the red line will become one cluster.
groups <- cutree(theData, h=2) # change "h" here to 600 after converting to feet.
We can plot them as a scatter plot to see how they look:
plot(theData, col=groups)

Promising. The points nearby form clusters which is what we wanted.
Let's add centers and circles around those centers with the radius of 1 (so that the max distance within the circle is 2):
G1 <- tapply(theData[,1], groups, mean) # means of groups
G2 <- tapply(theData[,2], groups, mean) # ...
library(plotrix) # for drawing circles
plot(theData, col=groups)
points(G1, G2, col= 1:6, cex=2, pch=19)
for(i in 1:length(G1)) { # draw circles
draw.circle(G1[i], G2[i], 1, border=i,lty=3,lwd=3)
}

Looks like drawing circles around the mean is not the best way to capture all of the points within the cluster. Nevertheless visually it can be verified that maximum distance between the points in one groups is 2. (just try shifting circles a bit to encapsulate all of the points).
来源:https://stackoverflow.com/questions/26540831/how-to-cluster-points-and-plot