k-means | 易学教程

how to use different distance formula other than euclidean distance in k means

阅读更多关于 how to use different distance formula other than euclidean distance in k means

问题 I am working with latitude longitude data. I have to make clusters based on distance between two points. Now distance between two different point is =ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371 I want to use k means in R. Is there any way I can override distance calculation in that process? 回答1: K-means is not distance based It is based on variance minimization . The sum-of-variance formula equals the sum of squared Euclidean distances , but the converse, for other

Printing ClusterID and its elements using Spark KMeans algo.

阅读更多关于 Printing ClusterID and its elements using Spark KMeans algo.

问题 I have this program which prints the MSSE of Kmeans algorithm on apache-spark. There are 20 clusters generated. I am trying to print the clusterID and the elements that got assigned to respective clusterID. How do i loop over the clusterID to print the elements. Thank you guys!! val sc = new SparkContext("local", "KMeansExample","/usr/local/spark/", List("target/scala-2.10/kmeans_2.10-1.0.jar")) // Load and parse the data val data = sc.textFile("kmeans.csv") val parsedData = data.map( s =>

Most mutually distant k elements (clustering?)

阅读更多关于 Most mutually distant k elements (clustering?)

问题 I have a simple machine learning question: I have n (~110) elements, and a matrix of all the pairwise distances. I would like to choose the 10 elements that are most far apart. That is, I want to Maximize: Choose 10 different elements. Return min distance over (all pairings within the 10). My distance metric is symmetric and respects the triangle inequality. What kind of algorithm can I use? My first instinct is to do the following: Cluster the n elements into 20 clusters. Replace each

k-means clustering implementation in Javascript?

阅读更多关于 k-means clustering implementation in Javascript?

问题 I'm in need for a Javascript implementation of the k-means clustering algorithm. I only have 1-dimensional data and rarely more than 100 items, so performance is not an issue. PS: I could only find one but it seems extremely unsteady, resulting in completely different clusters on virtually every call. 回答1: k-means in Javascript: http://code.google.com/p/hdict/source/browse/gae/files/kmeans.js http://www.mymessedupmind.co.uk/index.php/javascript-k-mean-algorithm Applet: http://www.math.le.ac

Where to find a reliable K-medoid(Not k-means) open source software/tool? [closed]

阅读更多关于 Where to find a reliable K-medoid(Not k-means) open source software/tool? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I am learning the K-medoids algorithm so I am sorry if I ask inappropriate questions. As I know,the K-medoids algorithm implements a K-means clustering but use actual data points to be centroid instead of mathematical calculated means. As I googled online, I found a lot of k-means tools such as GenePattern,

Where to find a reliable K-medoid(Not k-means) open source software/tool? [closed]

阅读更多关于 Where to find a reliable K-medoid(Not k-means) open source software/tool? [closed]

Running clustering algorithms in ELKI

阅读更多关于 Running clustering algorithms in ELKI

问题 I need to run a k-medoids clustering algorithm by using ELKI programmatically. I have a similarity matrix that I wish to input to the algorithm. Is there any code snippet available for how to run ELKI algorithms? I basically need to know how to create Database and Relation objects, create a custom distance function, and read the algorithm output. Unfortunately the ELKI tutorial (http://elki.dbs.ifi.lmu.de/wiki/Tutorial) focuses on the GUI version and on implementing new algorithms, and trying

Cosine distance as vector distance function for k-means

阅读更多关于 Cosine distance as vector distance function for k-means

问题 I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient's value is the duration in seconds spent at the corresponding place or 0 if that place was not visited. E.g. for the graph: the vector: v1 = {100, 50, 0 30, 0} would mean that we spent: 100secs at vertex 1 50secs at vertex 2 and 30secs at vertex 4 (vertices 3 & 5 where not visited, thus the 0s). I want to run a k-means clustering and I've

Is Triangle inequality necessary for kmeans?

阅读更多关于 Is Triangle inequality necessary for kmeans?

问题 I wonder if Triangle inequality is necessary for the distance measure used in kmeans. 回答1: k-means is designed for Euclidean distance, which happens to satisfy triangle inequality. Using other distance functions is risky, as it may stop converging . The reason however is not the triangle inequality, but the mean might not minimize the distance function . (The arithmetic mean minimizes the sum-of-squares, not arbitrary distances!) There are faster methods for k-means that exploit the triangle

k-means empty cluster

阅读更多关于 k-means empty cluster

问题 I try to implement k-means as a homework assignment. My exercise sheet gives me following remark regarding empty centers: During the iterations, if any of the cluster centers has no data points associated with it, replace it with a random data point. That confuses me a bit, firstly Wikipedia or other sources I read do not mention that at all. I further read about a problem with 'choosing a good k for your data' - how is my algorithm supposed to converge if I start setting new centers for