cluster-analysis | 易学教程

Subsets of a dataset as separate dendrograms, but in the same plot

阅读更多关于 Subsets of a dataset as separate dendrograms, but in the same plot

问题 I know I can plot a dendrogram as follows library(cluster) d <- mtcars d[,8:11] <- lapply(d[,8:11], as.factor) gdist <- daisy(d, metric = c("gower"), stand = FALSE) dendro <- hclust(gdist, method = "average") plot(as.dendrogram(dendro)) However I have some groups identified (eg. by an iterative classification method), given as the last column in d G <- c(1,2,3,3,4,4,5,5,5,5,1,2,1,1,2,4,1,3,4,5,1,7,4,3,3,2,1,1,1,3,5,6) d$Group <- G head(d) mpg cyl disp hp drat wt qsec vs am gear carb Group

K-means cluster plot [closed]

阅读更多关于 K-means cluster plot [closed]

问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 6 years ago . I have a data matrix of 510x6 and want to perform K-means cluster analysis on this. I am having problem in plotting all the different clusters in 2 dimensions. Is it not possible to plot 6 different clusters in 2 dimensions? 回答1: Let's start by looking at some data which is 150x4 and try and split

Use sklearn DBSCAN model to classify new entries

阅读更多关于 Use sklearn DBSCAN model to classify new entries

问题 I have a huge "dynamic" dataset and I'm trying to find interesting clusters on it. After running a lot of different unsupervised clustering algorithms I have found a configuration of DBSCAN which gives coherent results. I would like to extrapolate the model that DBSCAN creates according to my test data to apply it to other datasets, but without re-running the algorithm. I cannot run the algorithm over the whole dataset cause it would run out of memory, and the model might not make sense to me

Find connected components in a graph in MATLAB

阅读更多关于 Find connected components in a graph in MATLAB

问题 I have many 3D data points, and I wish to find 'connected components' in this graph. This is where clusters are formed that exhibit the following properties: Each cluster contains points all of which are at most distance from another point in the cluster. All points in two distinct clusters are at least distance from each other. This problem is described in the question and answer here. Is there a MATLAB implementation of such an algorithm built-in or available on the FEX? Simple searches

Python K means clustering

阅读更多关于 Python K means clustering

问题 I am trying to implement the code on this website to estimate what value of K I should use for my K means clustering. https://datasciencelab.wordpress.com/2014/01/21/selection-of-k-in-k-means-clustering-reloaded/ However I am not getting any success - in particular I am trying to get the f(k) vs the number of clusters k graph which I can use to procure the ideal value of k to use. My data format is as follows: Each of the coordinates have 5 dimensions/variables i.e. they are data points that

Calculating a Voronoi diagram for planes in 3D

阅读更多关于 Calculating a Voronoi diagram for planes in 3D

问题 Is there a code/library that can calculate a Voronoi diagram for planes (parallelograms) in 3D? I checked Qhull and it seems it can only work with points, in its examples Voro++ works with different size of spheres but I couldn't find anything for polygons. In this image (sample planes in 3d) the parallelograms are 3D since they have a thickness, but in this case the thickness will be zero.! 回答1: Voronoi cells are not parallelograms. You are confused here by the image you posted. Voronoi cell

Cluster unseen points using Spectral Clustering

阅读更多关于 Cluster unseen points using Spectral Clustering

问题 I am using Spectral Clustering method to cluster my data. The implementation seems to work properly. However, I have one problem - I have a set of unseen points (not present in the training set) and would like to cluster these based on the centroids derived by k-means (Step 5 in the paper). However, the k-means is computed on the k eigenvectors and therefore the centroids are low-dimensional. Does any-one knows a method that can be used to map an unseen point to a low-dimension and compute

drawing heatmap with dendrogram along with sample labels

阅读更多关于 drawing heatmap with dendrogram along with sample labels

问题 Using the heatmap function of made4 , I made this heatmap dendrogram from the example file: data(khan) heatplot(khan$train[1:30,], lowcol="blue", highcol="red") How can I add a panel of labels for the samples on the edges of the heatmap, like in this figure? The labels in this case are the squares that are adjacent to the heatmap first col and top row, used to denote a label for each sample so that one can see if the labels correspond with the clustering shown by the heatmap/dendrogram. In

Clustering problem

阅读更多关于 Clustering problem

问题 I've been tasked to find N clusters containing the most points for a certain data set given that the clusters are bounded by a certain size. Currently, I am attempting to do this by plugging in my data into a kd-tree, iterating over the data and finding its nearest neighbor, and then merging the points if the cluster they make does not exceed a limit. I'm not sure this approach will give me a global solution so I'm looking for ways to tweak it. If you can tell me what type of problem this

How to pick the the T1 and T2 threshold values for Canopy Clustering?

阅读更多关于 How to pick the the T1 and T2 threshold values for Canopy Clustering?

问题 I am trying to implement the Canopy clustering algorithm along with K-Means. I've done some searching online that says to use Canopy clustering to get your initial starting points to feed into K-means, the problem is, in Canopy clustering, you need to specify 2 threshold values for the canopy: T1 and T2, where points in the inner threshold are strongly tied to that canopy and the points in the wider threshold are less tied to that canopy. How are these threshold, or distances from the canopy