cluster-analysis | 易学教程

How to save cluster assignments in output file using Weka clustering XMeans?

阅读更多关于 How to save cluster assignments in output file using Weka clustering XMeans?

问题 Context I want to use Weka clustering algorithm XMeans . However I cannot figure out how to obtain cluster assignments from GUI of Weka . At the moment I can only see a list of cluster IDs along with percentage of entries assigned to each cluster. Question There any way to save cluster assignments for each entry in, e.g. CSV format? 回答1: Do everything in the "Preprocess Panel". This is one way to do this: Load Data File. Remove any Classification Attribute or Identifiers Choose Preprocess /

I am having a hard time understanding the concept of Ordering in OPTICS Clustering algorithm

阅读更多关于 I am having a hard time understanding the concept of Ordering in OPTICS Clustering algorithm

问题 I am having a hard time understanding the concept of Ordering in OPTICS Clustering algorithm. I Would be grateful if someone gives a logical and intuitive explanation of the ordering and also explain what res$order does in the following code and what is the reahability plot(which can be obtained by the command 'plot(res)'). library(dbscan) set.seed(2) n <- 400 x <- cbind( x = runif(4, 0, 1) + rnorm(n, sd=0.1), y = runif(4, 0, 1) + rnorm(n, sd=0.1) ) plot(x, col=rep(1:4, time = 100)) res <-

String clustering in Python

阅读更多关于 String clustering in Python

问题 I have a list of strings and I want to classify it by using clustering in Python. list = ['String1', 'String2', 'String3',...] I want to use Levenshtein distance, so I used jellyfish library. Given two strings, I know that their distance can be found this way: jellyfish.levenshtein_distance('string1', 'string2') My problem is that I don't know how to use scipy.cluster.hierarchy to get a list in Python of each cluster. I have also tried using linkage function: linkage(y[, method, metric]) But

Group variables by clusters on heatmap in R

阅读更多关于 Group variables by clusters on heatmap in R

问题 I am trying to reproduce the first figure of this paper on graph clustering: Here is a sample of my adjacency matrix: data=cbind(c(48,0,0,0,0,1,3,0,1,0),c(0,75,0,0,3,2,1,0,0,1),c(0,0,34,1,16,0,3,0,1,1),c(0,0,1,58,0,1,3,1,0,0),c(0,3,16,0,181,6,6,0,2,2),c(1,2,0,1,6,56,2,1,0,1),c(3,1,3,3,6,2,129,0,0,1),c(0,0,0,1,0,1,0,13,0,1),c(1,0,1,0,2,0,0,0,70,0),c(0,1,1,0,2,1,1,1,0,85)) colnames(data)=letters[1:nrow(data)] rownames(data)=colnames(data) And with these commands I obtain the following heatmap:

Plot the sklearn clusters in python

阅读更多关于 Plot the sklearn clusters in python

问题 I have the following sklearn clusters obtained using affinity propagation. import sklearn.cluster import numpy as np sims = np.array([[0, 17, 10, 32, 32], [18, 0, 6, 20, 15], [10, 8, 0, 20, 21], [30, 16, 20, 0, 17], [30, 15, 21, 17, 0]]) affprop = sklearn.cluster.AffinityPropagation(affinity="precomputed", damping=0.5) affprop.fit(sims) cluster_centers_indices = affprop.cluster_centers_indices_ labels = affprop.labels_ #number of clusters n_clusters_ = len(cluster_centers_indices) Now I want

Analyzing octopus catches with LinearK function in R [closed]

阅读更多关于 Analyzing octopus catches with LinearK function in R [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 months ago . I hope you can help me with this problem i can't find how to overcome. Sorry if I made some mistakes while writing this post, my english is a bit rusty right now. Here is the question. I have .shp data that I want to analyze in R. The .shp can be either lines that represent lines of traps we set to catch

how to set Spark Kmeans initial centers

阅读更多关于 how to set Spark Kmeans initial centers

问题 I'm using Spark ML for run Kmeans. I have bunch of data and three existing centers, for example the three centers are: [1.0,1.0,1.0],[5.0,5.0,5.0],[9.0,9.0,9.0]. So how can I indicate the Kmeans centers are the above three vectors. I saw Kmean object has seed parameter, but the seed parameter is an long type not an array. So how can I tell Spark Kmeans to only use the existing centers for clustering. Or say, I didn't understand what does seed mean in Spark Kmeans, I suppose the seeds should

Cluster Analysis in R with missing data

阅读更多关于 Cluster Analysis in R with missing data

问题 So I spent a good amount of time trying to find the answer on how to do this. The only answer I have found so far is here: How to perform clustering without removing rows where NA is present in R Unfortunately, this is not working for me. So here is an example of my data (d in this example): Q9Y6X2 NA -6.350055943 -5.78314068 Q9Y6X3 NA NA -5.78314068 Q9Y6X6 0.831273549 4.875151493 0.78671493 Q9Y6Y8 4.831273549 0.457298979 5.59406985 Q9Y6Z4 4.831273549 4.875151493 NA Here is what I tried: >

k-means using signature matrix generated from minhash

阅读更多关于 k-means using signature matrix generated from minhash

问题 I have used minhash on documents and their shingles to generate a signature matrix from these documents. I have verified that the signature matrices are good as comparing jaccard distances of known similar documents (say, two articles about the same sports team or two articles about the same world event) give correct readings. My question is: does it make sense to use this signature matrix to perform k-means clustering? I've tried using the signature vectors of documents and calculating the

Markov Clustering

阅读更多关于 Markov Clustering

问题 I have two questions to be precise. Firstly, I would like to know if there is an easy way to adapt the Markov Clustering Algorithm so that I can specify in advance, how many clusters I would like to have at the end. If not, which similiar algorithm would you recommend? And secondly how should be dealt with overlapping clusters in the Markov world? 回答1: 1). There is no easy way to adapt the MCL algorithm (note: its name is 'Markov cluster algorithm' without the 'ing'. Many people verbalise it