cluster-analysis

How to save cluster assignments in output file using Weka clustering XMeans?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-24 16:04:37
问题 Context I want to use Weka clustering algorithm XMeans . However I cannot figure out how to obtain cluster assignments from GUI of Weka . At the moment I can only see a list of cluster IDs along with percentage of entries assigned to each cluster. Question There any way to save cluster assignments for each entry in, e.g. CSV format? 回答1: Do everything in the "Preprocess Panel". This is one way to do this: Load Data File. Remove any Classification Attribute or Identifiers Choose Preprocess /

I am having a hard time understanding the concept of Ordering in OPTICS Clustering algorithm

醉酒当歌 提交于 2019-12-24 11:45:51
问题 I am having a hard time understanding the concept of Ordering in OPTICS Clustering algorithm. I Would be grateful if someone gives a logical and intuitive explanation of the ordering and also explain what res$order does in the following code and what is the reahability plot(which can be obtained by the command 'plot(res)'). library(dbscan) set.seed(2) n <- 400 x <- cbind( x = runif(4, 0, 1) + rnorm(n, sd=0.1), y = runif(4, 0, 1) + rnorm(n, sd=0.1) ) plot(x, col=rep(1:4, time = 100)) res <-

String clustering in Python

别等时光非礼了梦想. 提交于 2019-12-24 01:54:54
问题 I have a list of strings and I want to classify it by using clustering in Python. list = ['String1', 'String2', 'String3',...] I want to use Levenshtein distance, so I used jellyfish library. Given two strings, I know that their distance can be found this way: jellyfish.levenshtein_distance('string1', 'string2') My problem is that I don't know how to use scipy.cluster.hierarchy to get a list in Python of each cluster. I have also tried using linkage function: linkage(y[, method, metric]) But

Group variables by clusters on heatmap in R

為{幸葍}努か 提交于 2019-12-24 01:33:33
问题 I am trying to reproduce the first figure of this paper on graph clustering: Here is a sample of my adjacency matrix: data=cbind(c(48,0,0,0,0,1,3,0,1,0),c(0,75,0,0,3,2,1,0,0,1),c(0,0,34,1,16,0,3,0,1,1),c(0,0,1,58,0,1,3,1,0,0),c(0,3,16,0,181,6,6,0,2,2),c(1,2,0,1,6,56,2,1,0,1),c(3,1,3,3,6,2,129,0,0,1),c(0,0,0,1,0,1,0,13,0,1),c(1,0,1,0,2,0,0,0,70,0),c(0,1,1,0,2,1,1,1,0,85)) colnames(data)=letters[1:nrow(data)] rownames(data)=colnames(data) And with these commands I obtain the following heatmap:

Plot the sklearn clusters in python

那年仲夏 提交于 2019-12-24 01:18:17
问题 I have the following sklearn clusters obtained using affinity propagation. import sklearn.cluster import numpy as np sims = np.array([[0, 17, 10, 32, 32], [18, 0, 6, 20, 15], [10, 8, 0, 20, 21], [30, 16, 20, 0, 17], [30, 15, 21, 17, 0]]) affprop = sklearn.cluster.AffinityPropagation(affinity="precomputed", damping=0.5) affprop.fit(sims) cluster_centers_indices = affprop.cluster_centers_indices_ labels = affprop.labels_ #number of clusters n_clusters_ = len(cluster_centers_indices) Now I want

Analyzing octopus catches with LinearK function in R [closed]

≡放荡痞女 提交于 2019-12-24 00:54:56
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 months ago . I hope you can help me with this problem i can't find how to overcome. Sorry if I made some mistakes while writing this post, my english is a bit rusty right now. Here is the question. I have .shp data that I want to analyze in R. The .shp can be either lines that represent lines of traps we set to catch

how to set Spark Kmeans initial centers

烈酒焚心 提交于 2019-12-24 00:10:19
问题 I'm using Spark ML for run Kmeans. I have bunch of data and three existing centers, for example the three centers are: [1.0,1.0,1.0],[5.0,5.0,5.0],[9.0,9.0,9.0]. So how can I indicate the Kmeans centers are the above three vectors. I saw Kmean object has seed parameter, but the seed parameter is an long type not an array. So how can I tell Spark Kmeans to only use the existing centers for clustering. Or say, I didn't understand what does seed mean in Spark Kmeans, I suppose the seeds should

Cluster Analysis in R with missing data

拟墨画扇 提交于 2019-12-23 18:01:45
问题 So I spent a good amount of time trying to find the answer on how to do this. The only answer I have found so far is here: How to perform clustering without removing rows where NA is present in R Unfortunately, this is not working for me. So here is an example of my data (d in this example): Q9Y6X2 NA -6.350055943 -5.78314068 Q9Y6X3 NA NA -5.78314068 Q9Y6X6 0.831273549 4.875151493 0.78671493 Q9Y6Y8 4.831273549 0.457298979 5.59406985 Q9Y6Z4 4.831273549 4.875151493 NA Here is what I tried: >

k-means using signature matrix generated from minhash

与世无争的帅哥 提交于 2019-12-23 12:14:42
问题 I have used minhash on documents and their shingles to generate a signature matrix from these documents. I have verified that the signature matrices are good as comparing jaccard distances of known similar documents (say, two articles about the same sports team or two articles about the same world event) give correct readings. My question is: does it make sense to use this signature matrix to perform k-means clustering? I've tried using the signature vectors of documents and calculating the

Markov Clustering

≯℡__Kan透↙ 提交于 2019-12-23 10:58:08
问题 I have two questions to be precise. Firstly, I would like to know if there is an easy way to adapt the Markov Clustering Algorithm so that I can specify in advance, how many clusters I would like to have at the end. If not, which similiar algorithm would you recommend? And secondly how should be dealt with overlapping clusters in the Markov world? 回答1: 1). There is no easy way to adapt the MCL algorithm (note: its name is 'Markov cluster algorithm' without the 'ing'. Many people verbalise it