hierarchical-clustering

Colour the tick lables in a dendrogram to match the cluster colours

半腔热情 提交于 2019-12-30 14:38:45
问题 How can I individually colour the labels of a dendrogram so that they match the colours of the clusters in MATLAB? Here is an example desired output generated using the code in my answer below (note the lables are just the 50 charater series 'A':'r' ): If there is a more straightforward way to do this, please do post an answer as I was unable to find the solution to this by googling. If not, the code is below for posterity. 回答1: I could not find a definitive answer to this but I managed to

sklearn agglomerative clustering: dynamically updating the number of clusters

爷,独闯天下 提交于 2019-12-30 11:28:18
问题 The documentation for sklearn.cluster.AgglomerativeClustering mentions that, when varying the number of clusters and using caching, it may be advantageous to compute the full tree. This seems to imply that it is possible to first compute the full tree, and then quickly update the number of desired clusters as necessary, without recomputing the tree (with caching). However this procedure for changing the number of clusters does not seem to be documented. I would like to do this but am unsure

Clustering - how to find the nearest to a cluster

喜夏-厌秋 提交于 2019-12-25 18:44:36
问题 Hints I got as to a different question puzzled me quite a bit. I got an exercise, actually part of a larger exercise: Cluster some data, using hclust (done) Given a totally new vector, find out to which of the clusters you got in 1 it is nearest. According to the excercise, this should be done in quite short a time. However, after weeks I am puzzled whether this can be done at all, as apparently all I really get from hclust is a tree - and not, as I assumed, a number of clusters. As I suppose

R gplots heatmap.2 - key is unstable using breaks parameter (warning: unsorted 'breaks' will be sorted before use)

岁酱吖の 提交于 2019-12-24 17:53:15
问题 I'm visualizing a data set with the heatmap.2 function from the gplots package in R. Basically I'm performing a hierarchical clustering analysis on the original data, while forcing the heatmap to display a limited version of the data (between -3 and +3) to limit the effect of outliers on the appearance of the heatmap, while still retaining the original clustering. When I use the full data set ( fullmousedatamat ), it works just fine. However, when I use a partial data set (

Hierarchical clustering of heatmap in python

好久不见. 提交于 2019-12-24 14:49:52
问题 I have a NxM matri with values that range from 0 to 20. I easily get an heatmap by using Matplotlib and pcolor. Now I'd like to apply a hierarchical clustering and a dendogram using scipy. I'd like to re-order each dimension (rows and columns) in order to show which element are similar (according to the clustering result). If the matrix would be square (NxN) the code would be something like: clustering = linkage(matrix, method="average") dendrogram(clustering, orientation='right') How can I

Plot the cluster member in r

梦想与她 提交于 2019-12-21 06:21:06
问题 I use DTW package in R. and I finally finished hierarchical clustering. but I wanna plot time-series cluster separately like below picture. sc <- read.table("D:/handling data/confirm.csv", header=T, sep="," ) rownames(sc) <- sc$STDR_YM_CD sc$STDR_YM_CD <- NULL col_n <- colnames(sc) hc <- hclust(dist(sc), method="average") plot(hc, main="") How can I do it?? My data in http://blogattach.naver.com/e772fb415a6c6ddafd1370417f96e494346a9725/20170207_141_blogfile/khm2963_1486442387926_THgZRt_csv

Plot the cluster member in r

眉间皱痕 提交于 2019-12-21 06:20:53
问题 I use DTW package in R. and I finally finished hierarchical clustering. but I wanna plot time-series cluster separately like below picture. sc <- read.table("D:/handling data/confirm.csv", header=T, sep="," ) rownames(sc) <- sc$STDR_YM_CD sc$STDR_YM_CD <- NULL col_n <- colnames(sc) hc <- hclust(dist(sc), method="average") plot(hc, main="") How can I do it?? My data in http://blogattach.naver.com/e772fb415a6c6ddafd1370417f96e494346a9725/20170207_141_blogfile/khm2963_1486442387926_THgZRt_csv

How to calculate Silhouette Score of the scipy's fcluster using scikit-learn silhouette score?

回眸只為那壹抹淺笑 提交于 2019-12-21 05:20:55
问题 I am using scipy.cluster.hierarchy.linkage as a clustering algorithm and pass the result linkage matrix to scipy.cluster.hierarchy.fcluster, to get the flattened clusters, for various thresholds. I would like to calculate the Silhouette score of the results and compare them to choose the best threshold and prefer not to implement it on my own but use scikit-learn's sklearn.metrics.silhouette_score. How can I rearrange my clustering results as an input to sklearn.metrics.silhouette_score? 回答1:

Pruning dendrogram in scipy (hierarchical clustering)

牧云@^-^@ 提交于 2019-12-21 03:52:54
问题 I have a distance matrix with about 5000 entries, and use scipy's hierarchical clustering methods to cluster the matrix. The code I use for this is the following snippet: Y = fastcluster.linkage(D, method='centroid') # D-distance matrix Z1 = sch.dendrogram(Y,truncate_mode='level', p=7,show_contracted=True) Since the dendrogram will become rather dense with all this data, I use the truncate_mode to prune it a bit. All of this works, but I wonder how I can find out which of the original 5000

Using ELKI on custom objects and making sense of results

你离开我真会死。 提交于 2019-12-20 05:00:24
问题 I am trying to use ELKI's SLINK implementation of hierarchical clustering in my program. I have a set of objects (of my own type) that need to be clustered. For that, I convert them to feature vectors before clustering. This is how I currently got it to run and produce some result (code is in Scala): val clusterer = new SLINK(CosineDistanceFunction.STATIC, 3) val connection = new ArrayAdapterDatabaseConnection(featureVectors) val database = new StaticArrayDatabase(connection, null) database