Change in preference value does not affect the results of Affinity propagation Clustering

问题

Refer to the following code

import numpy as np
from sklearn.cluster import AffinityPropagation
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs

##############################################################################
# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5)

# Compute similarities
X_norms = np.sum(X ** 2, axis=1)
S = - X_norms[:, np.newaxis] - X_norms[np.newaxis, :] + 2 * np.dot(X, X.T)
p=[10 * np.median(S),np.mean(S,axis=1),np.mean(S,axis=0),100000,-100000]
##############################################################################

# Compute Affinity Propagation
for preference in p:
    af = AffinityPropagation().fit(S, preference)
    cluster_centers_indices = af.cluster_centers_indices_
    labels = af.labels_

    n_clusters_ = len(cluster_centers_indices)

    print('Estimated number of clusters: %d' % n_clusters_)
    print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
    print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))
    print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))
    print("Adjusted Rand Index: %0.3f" % \
          metrics.adjusted_rand_score(labels_true, labels))
    print("Adjusted Mutual Information: %0.3f" % \
          metrics.adjusted_mutual_info_score(labels_true, labels))
    D = (S / np.min(S))
    print("Silhouette Coefficient: %0.3f" %
          metrics.silhouette_score(D, labels, metric='precomputed'))

    ##############################################################################

    # Plot result
    import pylab as pl
    from itertools import cycle

    pl.close('all')
    pl.figure(1)
    pl.clf()

    colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
    for k, col in zip(range(n_clusters_), colors):
        class_members = labels == k
        cluster_center = X[cluster_centers_indices[k]]
        pl.plot(X[class_members, 0], X[class_members, 1], col + '.')
        pl.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
                markeredgecolor='k', markersize=14)
        for x in X[class_members]:
            pl.plot([cluster_center[0], x[0]], [cluster_center[1], x[1]], col)

    pl.title('Estimated number of clusters: %d' % n_clusters_)
    pl.show()

Although I am changing the preference value in the loop but still I am getting the same clusters? So why change in preference value is not affecting clustering results?

Update

When I tried the following code the outcome is below

When I tried the suggestion as recommended by Agost in the constructor then I got following output

回答1:

The preference is a parameter of the AffinityPropagation constructor not of the fit() method. You should change line 19 to:

af = AffinityPropagation(preference=preference).fit(S)

回答2:

The sklearn implementation of AP appears to be quite fragile.

My suggestions for using it:

use verbose=True to see when it failed to converge
increase the maximum number of iterations to at least 1000
reduce the damping by choosing 0.9 instead of 0.5

The reason is that with default parameters, sklearn's AP usually does not converge...

As mentioned by @AgostBiro before, preference is not a parameter of the fit function (but the constructor), so your original code ignored the preference, because fit(X,y) ignores y (it's a stupid API to have the dead y parameter, but sklearn likes that this looks like the classification API)

来源：https://stackoverflow.com/questions/56087793/change-in-preference-value-does-not-affect-the-results-of-affinity-propagation-c

标签

python

cluster-analysis