cluster-analysis

How to know about group information in cluster analysis (hierarchical)?

删除回忆录丶 提交于 2019-12-03 20:37:14
I have problem about group in cluster analysis(hierarchical cluster). As example, this is the dendrogram of complete linkage of Iris data set . After I use > table(cutree(hc, 3), iris$Species) This is the output : setosa versicolor virginica 1 50 0 0 2 0 23 49 3 0 27 1 I have read in one statistical website that, object 1 in the data always belongs to group/cluster 1. From the output above, we know that setosa is in group 1 . Then, how I am going to know about the other two species. How do they fall into either group 2 or 3. How did it happen. Perhaps there is a calculation I need to know? I'm

K-means Plotting for 3 Dimensional Data

一个人想着一个人 提交于 2019-12-03 20:33:42
I'm working with k-means in MATLAB. I am trying to create the plot/graph, but my data has three dimensional array. Here is my k-means code: clc clear all close all load cobat.txt; % read the file k=input('Enter a number: '); % determine the number of cluster isRand=0; % 0 -> sequeantial initialization % 1 -> random initialization [maxRow, maxCol]=size(cobat); if maxRow<=k, y=[m, 1:maxRow]; elseif k>7 h=msgbox('cant more than 7'); else % initial value of centroid if isRand, p = randperm(size(cobat,1)); % random initialization for i=1:k c(i,:)=cobat(p(i),:); end else for i=1:k c(i,:)=cobat(i,:);

Is my python implementation of the Davies-Bouldin Index correct?

☆樱花仙子☆ 提交于 2019-12-03 17:03:43
I'm trying to calculate the Davies-Bouldin Index in Python. Here are the steps the code below tries to reproduce. 5 Steps : For each cluster, compute euclidean distances between each point to the centroid For each cluster, compute the average of these distances For each pair of clusters, compute the euclidean distance between their centroids Then, For each pair of clusters, make the sum of the average distances to their respective centroid (computed at step 2) and divide it by the distance separating them (computed at step 3). Finally, Compute the mean of all these divisions (= all indexes) to

Computing F-measure for clustering

点点圈 提交于 2019-12-03 16:28:29
Can anyone help me to calculate F-measure collectively ? I know how to calculate recall and precision, but don't know for a given algorithm how to calculate one F-measure value. As an exemple, suppose my algorithm creates m clusters, but I know there are n clusters for the same data (as created by another benchmark algorithm). I found one pdf but it is not useful since the collective value I got is greater than 1. Reference of pdf is F Measure explained . Specifically I have read some research paper, in which the author compares two algorithms on the basis of F-measure, they got collectively

Python Clustering Algorithms

Deadly 提交于 2019-12-03 16:26:53
问题 I've been looking around scipy and sklearn for clustering algorithms for a particular problem I have. I need some way of characterizing a population of N particles into k groups, where k is not necessarily know, and in addition to this, no a priori linking lengths are known (similar to this question). I've tried kmeans, which works well if you know how many clusters you want. I've tried dbscan, which does poorly unless you tell it a characteristic length scale on which to stop looking (or

Cluster quality measures

不想你离开。 提交于 2019-12-03 14:36:28
Does Matlab provide any facility for evaluating clustering methods? (cluster compactness and cluster separation. ....) Or is there any toolbox for it? Not in Matlab, but ELKI (Java) provides a dozen or so cluster quality measures for evaluation. Matlab provides Silhouette index and there is a toolbox CVAP: Cluster Validity Analysis Platform for Matlab. Which includes following validity indexes: Davies-Bouldin Calinski-Harabasz Dunn index R-squared index Hubert-Levin (C-index) Krzanowski-Lai index Hartigan index Root-mean-square standard deviation (RMSSTD) index Semi-partial R-squared (SPR)

How to change dendrogram labels in r

为君一笑 提交于 2019-12-03 14:29:53
I have a dendrogram in R. It is based on hierachical clustering using hclust. I am colouring labels that are different in different colours, but when I try changing the labels of my dedrogram (to the rows of the dataframe the cluster is based on) using dendrogram = dendrogram %>% set("labels", dataframe$column) the labels are replaced, but in the wrong positions. As example: My dendrogram looks like this: ___|___ | _|_ | | | | 1 0 2 when I now try changing the labels like specified above, the labels are changed, but they are applied from left to right in their order in the dataframe. If we

clustering image segments in opencv

心已入冬 提交于 2019-12-03 13:56:41
问题 I am working on motion detection with non-static camera using opencv. I am using a pretty basic background subtraction and thresholding approach to get a broad sense of all that's moving in a sample video. After thresholding, I enlist all separable "patches" of white pixels, store them as independent components and color them randomly with red, green or blue. The image below shows this for a football video where all such components are visible. I create rectangles over these detected

How to add k-means predicted clusters in a column to a dataframe in Python

旧街凉风 提交于 2019-12-03 13:19:31
问题 Have a question about kmeans clustering in python. So I did the analysis that way: from sklearn.cluster import KMeans km = KMeans(n_clusters=12, random_state=1) new = data._get_numeric_data().dropna(axis=1) kmeans.fit(new) predict=km.predict(new) How can I add the column with cluster results to my first dataframe "data" as an additional column? Thanks! 回答1: Assuming the column length is as the same as each column in you dataframe df , all you need to do is this: df['NEW_COLUMN'] = Series

PCA multiplot in R

﹥>﹥吖頭↗ 提交于 2019-12-03 12:34:17
问题 I have a dataset that looks like this: India China Brasil Russia SAfrica Kenya States Indonesia States Argentina Chile Netherlands HongKong 0.0854026763 0.1389383234 0.1244184371 0.0525460881 0.2945586244 0.0404562539 0.0491597968 0 0 0.0618342901 0.0174891774 0.0634064181 0 0.0519483159 0.0573851759 0.0756806292 0.0207164181 0.0409872092 0.0706355932 0.0664503936 0.0775285039 0.008545575 0.0365674701 0.026595575 0.064280902 0.0338135148 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0943708876 0 0 0.0967733329