K-means algorithm variation with equal cluster size

前端未结

关注

 16  940

挽巷 2020-11-27 14:26

I\'m looking for the fastest algorithm for grouping points on a map into equally sized groups, by distance. The k-means clustering algorithm looks straightforward and promis

16条回答

爱一瞬间的悲伤 (楼主)

2020-11-27 15:00
There is a cleaner post-processing, given cluster centroids. Let N be the number of items, K the number of clusters and S = ceil(N/K) maximum cluster size.
- Create a list of tuples (item_id, cluster_id, distance)
- Sort tuples with respect to distance
- For each element (item_id, cluster_id, distance) in the sorted list of tuples:
  - if number of elements in cluster_id exceeds S do nothing
  - otherwise add item_id to cluster cluster_id.
This runs in O(NK lg(N)), should give comparable results to @larsmans solution and is easier to implement. In pseudo-python
```
dists = []
clusts = [None] * N
counts = [0] * K

for i, v in enumerate(items):
    dist = map( lambda x: dist(x, v), centroids )
    dd = map( lambda (k, v): (i, k, v), enumerate(dist) )
    dists.extend(dd)

dists = sorted(dists, key = lambda (x,y,z): z)

for (item_id, cluster_id, d) in dists:
    if counts[cluster_id] >= S:
        continue
    if clusts[item_id] == None:
        clusts[item_id] = cluster_id
        counts[cluster_id] = counts[cluster_id] + 1
```
0 讨论(0)

查看其它16个回答
发布评论:

提交评论
- 加载中...