K-means algorithm variation with equal cluster size

前端 未结 16 940
挽巷
挽巷 2020-11-27 14:26

I\'m looking for the fastest algorithm for grouping points on a map into equally sized groups, by distance. The k-means clustering algorithm looks straightforward and promis

16条回答
  •  爱一瞬间的悲伤
    2020-11-27 15:00

    There is a cleaner post-processing, given cluster centroids. Let N be the number of items, K the number of clusters and S = ceil(N/K) maximum cluster size.

    • Create a list of tuples (item_id, cluster_id, distance)
    • Sort tuples with respect to distance
    • For each element (item_id, cluster_id, distance) in the sorted list of tuples:
      • if number of elements in cluster_id exceeds S do nothing
      • otherwise add item_id to cluster cluster_id.

    This runs in O(NK lg(N)), should give comparable results to @larsmans solution and is easier to implement. In pseudo-python

    dists = []
    clusts = [None] * N
    counts = [0] * K
    
    for i, v in enumerate(items):
        dist = map( lambda x: dist(x, v), centroids )
        dd = map( lambda (k, v): (i, k, v), enumerate(dist) )
        dists.extend(dd)
    
    dists = sorted(dists, key = lambda (x,y,z): z)
    
    for (item_id, cluster_id, d) in dists:
        if counts[cluster_id] >= S:
            continue
        if clusts[item_id] == None:
            clusts[item_id] = cluster_id
            counts[cluster_id] = counts[cluster_id] + 1
    

提交回复
热议问题