hierarchical-clustering

Colour the tick lables in a dendrogram to match the cluster colours

阅读更多关于 Colour the tick lables in a dendrogram to match the cluster colours

问题 How can I individually colour the labels of a dendrogram so that they match the colours of the clusters in MATLAB? Here is an example desired output generated using the code in my answer below (note the lables are just the 50 charater series 'A':'r' ): If there is a more straightforward way to do this, please do post an answer as I was unable to find the solution to this by googling. If not, the code is below for posterity. 回答1: I could not find a definitive answer to this but I managed to

sklearn agglomerative clustering: dynamically updating the number of clusters

阅读更多关于 sklearn agglomerative clustering: dynamically updating the number of clusters

问题 The documentation for sklearn.cluster.AgglomerativeClustering mentions that, when varying the number of clusters and using caching, it may be advantageous to compute the full tree. This seems to imply that it is possible to first compute the full tree, and then quickly update the number of desired clusters as necessary, without recomputing the tree (with caching). However this procedure for changing the number of clusters does not seem to be documented. I would like to do this but am unsure

Clustering - how to find the nearest to a cluster

阅读更多关于 Clustering - how to find the nearest to a cluster

问题 Hints I got as to a different question puzzled me quite a bit. I got an exercise, actually part of a larger exercise: Cluster some data, using hclust (done) Given a totally new vector, find out to which of the clusters you got in 1 it is nearest. According to the excercise, this should be done in quite short a time. However, after weeks I am puzzled whether this can be done at all, as apparently all I really get from hclust is a tree - and not, as I assumed, a number of clusters. As I suppose

R gplots heatmap.2 - key is unstable using breaks parameter (warning: unsorted 'breaks' will be sorted before use)

阅读更多关于 R gplots heatmap.2 - key is unstable using breaks parameter (warning: unsorted 'breaks' will be sorted before use)

问题 I'm visualizing a data set with the heatmap.2 function from the gplots package in R. Basically I'm performing a hierarchical clustering analysis on the original data, while forcing the heatmap to display a limited version of the data (between -3 and +3) to limit the effect of outliers on the appearance of the heatmap, while still retaining the original clustering. When I use the full data set ( fullmousedatamat ), it works just fine. However, when I use a partial data set (

Hierarchical clustering of heatmap in python

阅读更多关于 Hierarchical clustering of heatmap in python

问题 I have a NxM matri with values that range from 0 to 20. I easily get an heatmap by using Matplotlib and pcolor. Now I'd like to apply a hierarchical clustering and a dendogram using scipy. I'd like to re-order each dimension (rows and columns) in order to show which element are similar (according to the clustering result). If the matrix would be square (NxN) the code would be something like: clustering = linkage(matrix, method="average") dendrogram(clustering, orientation='right') How can I

Plot the cluster member in r

阅读更多关于 Plot the cluster member in r

问题 I use DTW package in R. and I finally finished hierarchical clustering. but I wanna plot time-series cluster separately like below picture. sc <- read.table("D:/handling data/confirm.csv", header=T, sep="," ) rownames(sc) <- sc$STDR_YM_CD sc$STDR_YM_CD <- NULL col_n <- colnames(sc) hc <- hclust(dist(sc), method="average") plot(hc, main="") How can I do it?? My data in http://blogattach.naver.com/e772fb415a6c6ddafd1370417f96e494346a9725/20170207_141_blogfile/khm2963_1486442387926_THgZRt_csv

Plot the cluster member in r

阅读更多关于 Plot the cluster member in r

How to calculate Silhouette Score of the scipy's fcluster using scikit-learn silhouette score?

阅读更多关于 How to calculate Silhouette Score of the scipy's fcluster using scikit-learn silhouette score?

问题 I am using scipy.cluster.hierarchy.linkage as a clustering algorithm and pass the result linkage matrix to scipy.cluster.hierarchy.fcluster, to get the flattened clusters, for various thresholds. I would like to calculate the Silhouette score of the results and compare them to choose the best threshold and prefer not to implement it on my own but use scikit-learn's sklearn.metrics.silhouette_score. How can I rearrange my clustering results as an input to sklearn.metrics.silhouette_score? 回答1:

Pruning dendrogram in scipy (hierarchical clustering)

阅读更多关于 Pruning dendrogram in scipy (hierarchical clustering)

问题 I have a distance matrix with about 5000 entries, and use scipy's hierarchical clustering methods to cluster the matrix. The code I use for this is the following snippet: Y = fastcluster.linkage(D, method='centroid') # D-distance matrix Z1 = sch.dendrogram(Y,truncate_mode='level', p=7,show_contracted=True) Since the dendrogram will become rather dense with all this data, I use the truncate_mode to prune it a bit. All of this works, but I wonder how I can find out which of the original 5000

Using ELKI on custom objects and making sense of results

阅读更多关于 Using ELKI on custom objects and making sense of results

问题 I am trying to use ELKI's SLINK implementation of hierarchical clustering in my program. I have a set of objects (of my own type) that need to be clustered. For that, I convert them to feature vectors before clustering. This is how I currently got it to run and produce some result (code is in Scala): val clusterer = new SLINK(CosineDistanceFunction.STATIC, 3) val connection = new ArrayAdapterDatabaseConnection(featureVectors) val database = new StaticArrayDatabase(connection, null) database