cluster-analysis | 易学教程

Hierarchical clusterization heuristics

阅读更多关于 Hierarchical clusterization heuristics

问题 I want to explore relations between data items in large array. Every data item represented by multidimensional vector. First of all, I've decided to use clusterization. I'm interested in finding hierarchical relations between clusters (groups of data vectors). I'm able to calculate distance between my vectors. So at the first step I'm finding minimal spanning tree . After that I need to group data vectors according to links in my spanning tree. But at this step I'm disturbed - how to combine

Interpreting output from mahout clusterdumper

阅读更多关于 Interpreting output from mahout clusterdumper

问题 I ran a clustering test on crawled pages (more than 25K docs ; personal data set). I've done a clusterdump : $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-1/ --output clusteranalyze.txt The output after running cluster dumper is shown 25 elements "VL-xxxxx {}" : VL-24130{n=1312 c=[0:0.017, 10:0.007, 11:0.005, 14:0.017, 31:0.016, 35:0.006, 41:0.010, 43:0.008, 52:0.005, 59:0.010, 68:0.037, 72:0.056, 87:0.028, ... ] r=[0:0.442, 10:0.271, 11:0.198, 14:0.369, 31:0.421, ... ]} ..

How to do clustering using the matrix of correlation coefficients?

阅读更多关于 How to do clustering using the matrix of correlation coefficients?

问题 I have a correlation coefficient matrix (n*n). How to do clustering using the correlation coefficient matrix? Can I use linkage and fcluster function in SciPy? Linkage function needs n * m matrix (according to tutorial), but I want to use n*n matrix. My code is corre = mp_N.corr() # mp_N is raw data (m*n matrix) Z = linkage(corre, method='average') # 'corre' is correlation coefficient matrix fcluster(Z,2,'distance') Is this code right? If this code is wrong, how can I do clustering with

I have 2,000,000 points in 100 dimensionality space. How can I cluster them to K (e.g., 1000) clusters?

阅读更多关于 I have 2,000,000 points in 100 dimensionality space. How can I cluster them to K (e.g., 1000) clusters?

The problem comes as follows. I have M images and extract N features for each image, and the dimensionality of each feature is L. Thus, I have M*N features (2,000,000 for my case) and each feature has L dimensionality (100 for my case). I need to cluster these M*N features into K clusters. How can I do it? Thanks. Do you want 1000 clusters of images, or of features, or of (image, feature) pairs ? In any case, it sounds as though you'll have to reduce the data and use simpler methods. One possibility is two-pass K-cluster: a) split the 2 million data points into 32 clusters, b) split each of

Algorithm for clustering people with similar interests

阅读更多关于 Algorithm for clustering people with similar interests

I want to cluster people into groups based on their interests. For eg. people who like machine learning and graphs may be placed in a group and people who have interest in mathematics and economics etc. may be placed in a different group. The algorithm should be able to decide which people have most matching interests based on the interests of the people and create clusters.It should also be able to output about other persons in the group in which a particular person is placed. This does not sound like a particularly difficult clustering problem, and any of the off-the-shelf clustering

Is there any kind of subspace clustering package available in scikit-learn

阅读更多关于 Is there any kind of subspace clustering package available in scikit-learn

Is there any kind of subspace clustering packages available in scikit-learn. 来源： https://stackoverflow.com/questions/33483160/is-there-any-kind-of-subspace-clustering-package-available-in-scikit-learn

Kmeans matlab “Empty cluster created at iteration 1” error

阅读更多关于 Kmeans matlab “Empty cluster created at iteration 1” error

问题 I'm using this script to cluster a set of 3D points using the kmeans matlab function but I always get this error "Empty cluster created at iteration 1". The script I'm using: [G,C] = kmeans(XX, K, 'distance','sqEuclidean', 'start','sample'); XX can be found in this link XX value and the K is set to 3 So if anyone could please advise me why this is happening. 回答1: It is simply telling you that during the assign-recompute iterations, a cluster became empty (lost all assigned points). This is

Spectral Clustering a graph in python

阅读更多关于 Spectral Clustering a graph in python

问题 I'd like to cluster a graph in python using spectral clustering. Spectral clustering is a more general technique which can be applied not only to graphs, but also images, or any sort of data, however, it's considered an exceptional graph clustering technique. Sadly, I can't find examples of spectral clustering graphs in python online. Scikit Learn has two spectral clustering methods documented: SpectralClustering and spectral_clustering which seem like they're not aliases. Both of those

Node labels on circular phylogenetic tree

阅读更多关于 Node labels on circular phylogenetic tree

I am trying to create circular phylogenetic tree. I have this part of code: fit<- hclust(dist(Data[,-4]), method = "complete", members = NULL) nclus= 3 color=c('red','blue','green') color_list=rep(color,nclus/length(color)) clus=cutree(fit,nclus) plot(as.phylo(fit),type='fan',tip.color=color_list[clus],label.offset=0.2,no.margin=TRUE, cex=0.70, show.node.label = TRUE) And this is result: Also I am trying to show label for each node and to color branches. Any suggestion how to do that? Thanks! When you say "color branches" I assume you mean color the edges. This seems to work, but I have to

DBSCAN code in C# or vb.net , for Cluster Analysis

阅读更多关于 DBSCAN code in C# or vb.net , for Cluster Analysis

问题 Kindly I need your support to advice a library or a code in vb.net or C#.net that applies the DBSCAN to make Denisty Based Cluster of data . I have a GPS data , and I want to find stay points using the DBSCAN algorithm . But , I do not understand much of the technical part of the algorithm. 回答1: Not sure that's what you're looking for because the algorithm is very well explain on wikipedia. Do you want an explaination of the algorithm or a translation(or good library) of it in C# ? You can