Affinity Propagation preferences initialization

后端 未结 3 2366
佛祖请我去吃肉
佛祖请我去吃肉 2021-02-20 05:18

I need to perform clustering without knowing in advance the number of clusters. The number of cluster may be from 1 to 5, since I may find cases where all the samples belong to

3条回答
  •  半阙折子戏
    2021-02-20 05:46

    No, there is no flaw. AP does not use distances, but requires you to specify a similarity. I don't know the scikit implementation so well, but according to what I read, it uses negative squared Euclidean distances by default to compute the similarity matrix. If you set the input preference to the minimal Euclidean distance, you get a positive value, while all similarities are negative. So this will typically result in as many clusters as you have samples (note: the higher the input preference, the more clusters). I'd rather suggest to set the input preference to the minimal negative squared distance, i.e. -1 times the square of the largest distance in the data set. This will give you a much smaller number of clusters, but not necessarily one single cluster. I don't know whether the preferenceRange() function exists also in the scikit implementation. There is Matlab code on the AP homepage and it is also implemented in the R package 'apcluster' that I am maintaining. This function allows for determining meaningful bounds for the input preference parameter. I hope that helps.

提交回复
热议问题