k-means

Implementing the Elbow Method for finding the optimum number of clusters for K-Means Clustering in R [closed]

烂漫一生 提交于 2019-12-04 05:48:26
Closed. This question is off-topic. It is not currently accepting answers. Learn more . Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I want to use K-Means Clustering for my dataset. I am using the kmeans() function in R for doing this. k<-kmeans(data,centers=3) plotcluster(m,k$cluster) However i am not sure what is the correct value of K for this function. I want to try using the Elbow Method for this. Are there any packages in R which perform clustering using the Elbow Method for finding the optimum number of clusters. Andy

Sklearn.KMeans : how to avoid Memory or Value Error?

痞子三分冷 提交于 2019-12-04 05:16:57
I'm working on an image classification problem and I'm creating a bag of words model. To do that, I extracted the SIFT descriptors of all my images and I have to use the KMeans algorithm to find the centers to use as my bag of words. Here is the data I have: Number of images: 1584 Number of SIFT descriptors (vector of 32 elements): 571685 Number of center: 15840 So I ran a KMeans algorithm to compute my centers: dico = pickle.load(open('./dico.bin', 'rb')) # np.shape(dico) = (571685, 32) k = np.size(os.listdir(img_path)) * 10 # = 1584 * 10 kmeans = KMeans(n_clusters=k, n_init=1, verbose=1).fit

How to display the row name in K means cluster plot in R?

旧时模样 提交于 2019-12-04 05:16:03
问题 I am trying to plot the K-means cluster. The below is the code i use. library(cluster) library(fpc) data(iris) dat <- iris[, -5] # without known classification # Kmeans clustre analysis clus <- kmeans(dat, centers=3) clusplot(dat, clus$cluster, color=TRUE, shade=TRUE, labels=2, lines=0) I get the below picture: Instead of the row numbers, I want it displayed with a row name in characters. I understand this picture is producing had the data like the below: Sepal.Length Sepal.Width Petal.Length

Can K-means be used to help in pixel-value based separation of an image?

折月煮酒 提交于 2019-12-04 03:22:09
I'm trying to separate a greylevel image based on pixel-value: suppose pixels from 0 to 60 in one bin, 60-120 in another, 120-180 ... and so on til 255. The ranges are roughly equispaced in this case. However by using K-means clustering will it be possible to get more realistic measures of what my pixel value ranges should be? Trying to obtain similar pixels together and not waste bins where there is lower concentration of pixels present. EDITS (to include obtained results): k-means with no of cluster = 5 Of course K-Means can be used for color quantization. It's very handy for that. Let's see

R - 'princomp' can only be used with more units than variables

ε祈祈猫儿з 提交于 2019-12-04 03:21:24
I am using R software (R commander) to cluster my data. I have a smaller subset of my data containing 200 rows and about 800 columns. I am getting the following error when trying kmeans cluster and plot on a graph. "'princomp' can only be used with more units than variables" I then created a test doc of 10 row and 10 columns whch plots fine but when I add an extra column I get te error again. Why is this? I need to be able to plot my cluster. When I view my data set after performing kmeans on it I can see the extra results column which shows which clusters they belong to. IS there anything I

How to set k-Means clustering labels from highest to lowest with Python?

白昼怎懂夜的黑 提交于 2019-12-04 02:40:09
I have a dataset of 38 apartments and their electricity consumption in the morning, afternoon and evening. I am trying to clusterize this dataset using the k-Means implementation from scikit-learn, and am getting some interesting results. First clustering results: This is all very well, and with 4 clusters I obviously get 4 labels associated to each apartment - 0, 1, 2 and 3. Using the random_state parameter of KMeans method, I can fix the seed in which the centroids are randomly initialized, so consistently I get the same labels attributed to the same apartments. However, as this specific

Spherical k-means implementation in Python

笑着哭i 提交于 2019-12-04 02:07:10
I've been using scipy's k-means for quite some time now, and I'm pretty happy about the way it works in terms of usability and efficiency. However, now I want to explore different k-means variants, more specifically, I'd like to apply spherical k-means in some of my problems. Do you know any good Python implementation (i.e. similar to scipy's k-means) of spherical k-means? If not, how hard would it be to modify scipy's source code to adapt its k-means algorithm to be spherical? Thank you. In spherical k-means, you aim to guarantee that the centers are on the sphere, so you could adjust the

算法 - k-means算法

孤人 提交于 2019-12-04 02:02:23
一、聚类思想 所谓聚类算法是指将一堆没有标签的数据自动划分成几类的方法,属于无监督学习方法,这个方法要保证同一类的数据有相似的特征,如下图所示: 根据样本之间的距离或者说是相似性(亲疏性),把越相似、差异越小的样本聚成一类(簇),最后形成多个簇,使同一个簇内部的样本相似度高,不同簇之间差异性高。 二、k-means聚类分析算法 相关概念: K值 :要得到的簇的个数 质心 :每个簇的均值向量,即向量各维取平均即可 距离量度 :常用欧几里得距离和余弦相似度(先标准化) 算法流程: 1、首先确定一个k值,即我们希望将数据集经过聚类得到k个集合。 2、从数据集中随机选择k个数据点作为质心。 3、对数据集中每一个点,计算其与每一个质心的距离(如欧式距离),离哪个质心近,就划分到那个质心所属的集合。 4、把所有数据归好集合后,一共有k个集合。然后重新计算每个集合的质心。 5、如果新计算出来的质心和原来的质心之间的距离小于某一个设置的阈值(表示重新计算的质心的位置变化不大,趋于稳定,或者说收敛),我们可以认为聚类已经达到期望的结果,算法终止。 6、如果新质心和原质心距离变化很大,需要迭代3~5步骤。 三、数学原理 K-Means采用的启发式方式很简单,用下面一组图就可以形象的描述: 上图a表达了初始的数据集,假设k=2。在图b中,我们随机选择了两个k类所对应的类别质心,即图中的红色质心和蓝色质心

Implementation of k-means clustering algorithm

情到浓时终转凉″ 提交于 2019-12-03 21:24:55
In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop. can anyone please guide me where i'm making a mistake..? for simplicity, i hav taken the input in the program code itself. here is my code : import java.io.*; import java.lang.*; class Kmean { public static void main(String args[]) { int N=9; int arr[]={2,4,10,12,3,20,30,11,25}; // initial data int i,m1,m2,a,b,n=0; boolean flag=true; float sum1=0,sum2=0; a=arr[0];b=arr[1]; m1=a;

Running clustering algorithms in ELKI

无人久伴 提交于 2019-12-03 21:02:40
I need to run a k-medoids clustering algorithm by using ELKI programmatically. I have a similarity matrix that I wish to input to the algorithm. Is there any code snippet available for how to run ELKI algorithms? I basically need to know how to create Database and Relation objects, create a custom distance function, and read the algorithm output. Unfortunately the ELKI tutorial ( http://elki.dbs.ifi.lmu.de/wiki/Tutorial ) focuses on the GUI version and on implementing new algorithms, and trying to write code by looking at the Javadoc is frustrating. If someone is aware of any easy-to-use