cluster-analysis

Hierarchical clusterization heuristics

谁都会走 提交于 2019-12-04 13:17:33
问题 I want to explore relations between data items in large array. Every data item represented by multidimensional vector. First of all, I've decided to use clusterization. I'm interested in finding hierarchical relations between clusters (groups of data vectors). I'm able to calculate distance between my vectors. So at the first step I'm finding minimal spanning tree . After that I need to group data vectors according to links in my spanning tree. But at this step I'm disturbed - how to combine

Interpreting output from mahout clusterdumper

最后都变了- 提交于 2019-12-04 12:35:52
问题 I ran a clustering test on crawled pages (more than 25K docs ; personal data set). I've done a clusterdump : $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-1/ --output clusteranalyze.txt The output after running cluster dumper is shown 25 elements "VL-xxxxx {}" : VL-24130{n=1312 c=[0:0.017, 10:0.007, 11:0.005, 14:0.017, 31:0.016, 35:0.006, 41:0.010, 43:0.008, 52:0.005, 59:0.010, 68:0.037, 72:0.056, 87:0.028, ... ] r=[0:0.442, 10:0.271, 11:0.198, 14:0.369, 31:0.421, ... ]} ..

How to do clustering using the matrix of correlation coefficients?

百般思念 提交于 2019-12-04 12:35:25
问题 I have a correlation coefficient matrix (n*n). How to do clustering using the correlation coefficient matrix? Can I use linkage and fcluster function in SciPy? Linkage function needs n * m matrix (according to tutorial), but I want to use n*n matrix. My code is corre = mp_N.corr() # mp_N is raw data (m*n matrix) Z = linkage(corre, method='average') # 'corre' is correlation coefficient matrix fcluster(Z,2,'distance') Is this code right? If this code is wrong, how can I do clustering with

I have 2,000,000 points in 100 dimensionality space. How can I cluster them to K (e.g., 1000) clusters?

旧巷老猫 提交于 2019-12-04 12:22:02
The problem comes as follows. I have M images and extract N features for each image, and the dimensionality of each feature is L. Thus, I have M*N features (2,000,000 for my case) and each feature has L dimensionality (100 for my case). I need to cluster these M*N features into K clusters. How can I do it? Thanks. Do you want 1000 clusters of images, or of features, or of (image, feature) pairs ? In any case, it sounds as though you'll have to reduce the data and use simpler methods. One possibility is two-pass K-cluster: a) split the 2 million data points into 32 clusters, b) split each of

Algorithm for clustering people with similar interests

狂风中的少年 提交于 2019-12-04 12:10:35
I want to cluster people into groups based on their interests. For eg. people who like machine learning and graphs may be placed in a group and people who have interest in mathematics and economics etc. may be placed in a different group. The algorithm should be able to decide which people have most matching interests based on the interests of the people and create clusters.It should also be able to output about other persons in the group in which a particular person is placed. This does not sound like a particularly difficult clustering problem, and any of the off-the-shelf clustering

Kmeans matlab “Empty cluster created at iteration 1” error

拜拜、爱过 提交于 2019-12-04 10:49:26
问题 I'm using this script to cluster a set of 3D points using the kmeans matlab function but I always get this error "Empty cluster created at iteration 1". The script I'm using: [G,C] = kmeans(XX, K, 'distance','sqEuclidean', 'start','sample'); XX can be found in this link XX value and the K is set to 3 So if anyone could please advise me why this is happening. 回答1: It is simply telling you that during the assign-recompute iterations, a cluster became empty (lost all assigned points). This is

Spectral Clustering a graph in python

时光怂恿深爱的人放手 提交于 2019-12-04 10:08:56
问题 I'd like to cluster a graph in python using spectral clustering. Spectral clustering is a more general technique which can be applied not only to graphs, but also images, or any sort of data, however, it's considered an exceptional graph clustering technique. Sadly, I can't find examples of spectral clustering graphs in python online. Scikit Learn has two spectral clustering methods documented: SpectralClustering and spectral_clustering which seem like they're not aliases. Both of those

Node labels on circular phylogenetic tree

我与影子孤独终老i 提交于 2019-12-04 09:59:47
I am trying to create circular phylogenetic tree. I have this part of code: fit<- hclust(dist(Data[,-4]), method = "complete", members = NULL) nclus= 3 color=c('red','blue','green') color_list=rep(color,nclus/length(color)) clus=cutree(fit,nclus) plot(as.phylo(fit),type='fan',tip.color=color_list[clus],label.offset=0.2,no.margin=TRUE, cex=0.70, show.node.label = TRUE) And this is result: Also I am trying to show label for each node and to color branches. Any suggestion how to do that? Thanks! When you say "color branches" I assume you mean color the edges. This seems to work, but I have to

DBSCAN code in C# or vb.net , for Cluster Analysis

半世苍凉 提交于 2019-12-04 09:41:34
问题 Kindly I need your support to advice a library or a code in vb.net or C#.net that applies the DBSCAN to make Denisty Based Cluster of data . I have a GPS data , and I want to find stay points using the DBSCAN algorithm . But , I do not understand much of the technical part of the algorithm. 回答1: Not sure that's what you're looking for because the algorithm is very well explain on wikipedia. Do you want an explaination of the algorithm or a translation(or good library) of it in C# ? You can