Performance Analysis of Clustering Algorithms

问题

I have been given 2 data sets and want to perform cluster analysis for the sets using KNIME.

Once I have completed the clustering, I wish to carry out a performance comparison of 2 different clustering algorithms.

With regard to performance analysis of clustering algorithms, would this be a measure of time (algorithm time complexity and the time taken to perform the clustering of the data etc) or the validity of the output of the clusters? (or both)

Is there any other angle one look at to identify the performance (or lack of) for a clustering algorithm?

Many thanks in advance,

回答1:

It depends a lot on what data you have available.

A common way of measuring the performance is with respect to existing ("external") labels (albeit that would make more sense for classification than for clustering). There are around two dozen measures you can use for this.

When using an "internal" quality measure, make sure that it is independent of the algorithms. For example, k-means optimizes such a measure, and will always come out best when evaluating with respect to this measure.

回答2:

There are two categories of clustering evaluation methods and the choice depends on whether a ground truth is available. The first category is the extrinsic methods which require the existence of a ground truth and the other category is the intrinsic methods. In general, extrinsic methods try to assign a score to a clustering, given the ground truth, whereas intrinsic methods evaluate clustering by examining how well the clusters are separated and how compact they are.

For extrinsic methods (remember you need to have a ground available) one option is to use the BCubed precision and recall metrics. The BCubed precion and recall metrics differ from the traditional precision and recall in the sense that clustering is an unsupervised learning technique and therefore we do not know the labels of the clusters beforehand. For this reason BCubed metrics evaluate the precion and recall for evry object in a clustering on a given dataset according to the ground truth. The precision of an example is an indication of how many other examples in the same cluster belong to the same category as the example. The recall of an example reflects how many examples of the same category are assigned to the same cluster. Finally, we can combine these two metrics in one using the F2 metric.

Sources:

Data Mining Concepts and Techniques by Jiawei Han, Micheline, Kamber and Jian Pei
http://www.cs.utsa.edu/~qitian/seminar/Spring11/03_11_11/IR2009.pdf
My own experience in evaluating the performance of clustering

回答3:

A simple approach for the extrinsic methods where there is a ground truth available is to use a distance metric between clusterings; the ground truth is simply considered to be a clustering. Two good measures to use are the Variation of Information by Meila and, in my humble opinion, the split join distance by myself also discussed by Meila. I do not recommend the Mirkin index or the Rand index - I've written more about it here on stackexchange.

These metrics can be split into two constituent parts, each representing the distance of one of the clusterings to the largest common subclustering. It is worthwhile to consider both parts; if the ground truth part (to common subclustering) is very small, it means that the tested clustering is close to a superclustering; if the other part is small it means that the tested clustering is close to the common subclustering and hence close to a subclustering of the ground truth. In both cases the clustering can be said to be compatible with the ground truth. For more information see the link above.

回答4:

There are several benchmarks for the clustering algorithms evaluation with extrinsic quality measures (accuracy) and intrinsic measures (some internal statistics of the formed clusters):

Clubmark demonstrated in ICDM'18
WebOCD, see description in the paper
Circulo
ParallelComMetric
CluSim
CoDAR (the sources might be acquired from the paper authors)

Selection of the appropriate benchmark depends on the kind of the clustering algorithm (hard or soft clustering), kind (pairwise relations, attributed datasets or mixed) and size of the clustering data, required evaluation metrics and the admissible amount of the supervision. The Clubmark paper describes evaluation criteria in details.

The Clubmark is developed for the fully automatic parallel evaluation of many clustering algorithms (processing input data specified by the pairwise relations) on many large datasets (millions and billions of clustering elements) and evaluated mostly by accuracy metrics tracing resource consumption (processing and execution time, peak resident memory consumption, etc.).

But for a couple of algorithms on a couple of datasets even the manual evaluation is appropriate.

来源：https://stackoverflow.com/questions/9690706/performance-analysis-of-clustering-algorithms

标签

machine-learning

data-mining

cluster-analysis

knime