I\'m looking for the fastest algorithm for grouping points on a map into equally sized groups, by distance. The k-means clustering algorithm looks straightforward and promis
There is a cleaner post-processing, given cluster centroids. Let N be the number of items, K the number of clusters and S = ceil(N/K) maximum cluster size.
(item_id, cluster_id, distance)(item_id, cluster_id, distance) in the sorted list of tuples:
cluster_id exceeds S do nothingitem_id to cluster cluster_id.This runs in O(NK lg(N)), should give comparable results to @larsmans solution and is easier to implement. In pseudo-python
dists = []
clusts = [None] * N
counts = [0] * K
for i, v in enumerate(items):
dist = map( lambda x: dist(x, v), centroids )
dd = map( lambda (k, v): (i, k, v), enumerate(dist) )
dists.extend(dd)
dists = sorted(dists, key = lambda (x,y,z): z)
for (item_id, cluster_id, d) in dists:
if counts[cluster_id] >= S:
continue
if clusts[item_id] == None:
clusts[item_id] = cluster_id
counts[cluster_id] = counts[cluster_id] + 1