cluster-analysis | 易学教程

Where to find a reliable K-medoid(Not k-means) open source software/tool? [closed]

阅读更多关于 Where to find a reliable K-medoid(Not k-means) open source software/tool? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I am learning the K-medoids algorithm so I am sorry if I ask inappropriate questions. As I know,the K-medoids algorithm implements a K-means clustering but use actual data points to be centroid instead of mathematical calculated means. As I googled online, I found a lot of k-means tools such as GenePattern,

Where to find a reliable K-medoid(Not k-means) open source software/tool? [closed]

阅读更多关于 Where to find a reliable K-medoid(Not k-means) open source software/tool? [closed]

Clustering text in MATLAB

阅读更多关于 Clustering text in MATLAB

问题 I want to do hierarchical agglomerative clustering on texts in MATLAB. Say, I have four sentences, I have a pen. I have a paper. I have a pencil. I have a cat. I want to cluster the above four sentences to see which are more similar. I know Statistic toolbox has command like pdist to measure pair-wise distances, linkage to calculate the cluster similarity etc. A simple code like: X=[1 2; 2 3; 1 4]; Y=pdist(X, 'euclidean'); Z=linkage(Y, 'single'); H=dendrogram(Z) works fine and return a

Running clustering algorithms in ELKI

阅读更多关于 Running clustering algorithms in ELKI

问题 I need to run a k-medoids clustering algorithm by using ELKI programmatically. I have a similarity matrix that I wish to input to the algorithm. Is there any code snippet available for how to run ELKI algorithms? I basically need to know how to create Database and Relation objects, create a custom distance function, and read the algorithm output. Unfortunately the ELKI tutorial (http://elki.dbs.ifi.lmu.de/wiki/Tutorial) focuses on the GUI version and on implementing new algorithms, and trying

Computing F-measure for clustering

阅读更多关于 Computing F-measure for clustering

问题 Can anyone help me to calculate F-measure collectively ? I know how to calculate recall and precision, but don't know for a given algorithm how to calculate one F-measure value. As an exemple, suppose my algorithm creates m clusters, but I know there are n clusters for the same data (as created by another benchmark algorithm). I found one pdf but it is not useful since the collective value I got is greater than 1. Reference of pdf is F Measure explained. Specifically I have read some research

Cosine distance as vector distance function for k-means

阅读更多关于 Cosine distance as vector distance function for k-means

问题 I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient's value is the duration in seconds spent at the corresponding place or 0 if that place was not visited. E.g. for the graph: the vector: v1 = {100, 50, 0 30, 0} would mean that we spent: 100secs at vertex 1 50secs at vertex 2 and 30secs at vertex 4 (vertices 3 & 5 where not visited, thus the 0s). I want to run a k-means clustering and I've

Is my python implementation of the Davies-Bouldin Index correct?

阅读更多关于 Is my python implementation of the Davies-Bouldin Index correct?

问题 I'm trying to calculate the Davies-Bouldin Index in Python. Here are the steps the code below tries to reproduce. 5 Steps : For each cluster, compute euclidean distances between each point to the centroid For each cluster, compute the average of these distances For each pair of clusters, compute the euclidean distance between their centroids Then, For each pair of clusters, make the sum of the average distances to their respective centroid (computed at step 2) and divide it by the distance

Is Triangle inequality necessary for kmeans?

阅读更多关于 Is Triangle inequality necessary for kmeans?

问题 I wonder if Triangle inequality is necessary for the distance measure used in kmeans. 回答1: k-means is designed for Euclidean distance, which happens to satisfy triangle inequality. Using other distance functions is risky, as it may stop converging . The reason however is not the triangle inequality, but the mean might not minimize the distance function . (The arithmetic mean minimizes the sum-of-squares, not arbitrary distances!) There are faster methods for k-means that exploit the triangle

Scikit K-means clustering performance measure

阅读更多关于 Scikit K-means clustering performance measure

问题 I'm trying to do a clustering with K-means method but I would like to measure the performance of my clustering. I'm not an expert but I am eager to learn more about clustering. Here is my code : import pandas as pd from sklearn import datasets #loading the dataset iris = datasets.load_iris() df = pd.DataFrame(iris.data) #K-Means from sklearn import cluster k_means = cluster.KMeans(n_clusters=3) k_means.fit(df) #K-means training y_pred = k_means.predict(df) #We store the K-means results in a

Getting Xmeans clusterer output programmatically in Weka

阅读更多关于 Getting Xmeans clusterer output programmatically in Weka

问题 When using Kmeans in Weka, one can call getAssignments() on the resulting output of the model to get the cluster assignment for each given instance. Here's a (truncated) Jython example: >>>import weka.clusterers.SimpleKMeans as kmeans >>>kmeans.buildClusterer(data) >>>assignments = kmeans.getAssignments() >>>assignments >>>array('i',[14, 16, 0, 0, 0, 0, 16,...]) The index of each cluster number corresponds to the instance. So, instance 0 is in cluster 14, instance 1 is in cluster 16, and so