k-means | 易学教程

聚类-K-Means

阅读更多关于聚类-K-Means

1.什么是K-Means？ K均值算法聚类关键词：K个种子，均值聚类的概念：一种无监督的学习，事先不知道类别，自动将相似的对象归到同一个簇中 K-Means算法是一种聚类分析（cluster analysis）的算法，其主要是来计算数据聚集的算法，主要通过不断地取离种子点最近均值的算法. K-Means算法的思想很简单，对于给定的样本集，按照样本之间的距离大小，将样本集划分为K个簇。让簇内的点尽量紧密的连在一起，而让簇间的距离尽量的大. 2.k-Means原理每次计算距离采用的是欧式距离步骤图：步骤总结：从数据中选择k个对象作为初始聚类中心; 计算每个聚类对象到聚类中心的距离来划分；再次计算每个聚类中心 2~3步for循环，直到达到最大迭代次数，则停止，否则，继续操作。确定最优的聚类中心主要优点：原理比较简单，实现也是很容易，收敛速度快。聚类效果较优。算法的可解释度比较强。主要需要调参的参数仅仅是簇数k。主要缺点： K是事先给定的，这个K值的选定是非常难以估计的。很多时候，事先并不知道给定的数据集应该分成多少个类别才最合适。（ISODATA算法通过类的自动合并和分裂，得到较为合理的类型数目K） K-Means算法需要用初始随机种子点来搞，这个随机种子点太重要，不同的随机种子点会有得到完全不同的结果。（K-Means++算法可以用来解决这个问题

how to set initial centers of K-means openCV c++

阅读更多关于 how to set initial centers of K-means openCV c++

I am trying to do a segmentation of an image using OpenCv and Kmeans, the code that I have just implemented is the following: #include "opencv2/objdetect/objdetect.hpp" #include "opencv2/highgui/highgui.hpp" #include "opencv2/imgproc/imgproc.hpp" #include <iostream> #include <stdio.h> using namespace std; using namespace cv; int main(int, char** argv) { Mat src, Imagen2, Imagris, labels, centers,imgfondo; src = imread("C:/Users/Sebastian/Documents/Visual Studio 2015/Projects/ClusteringImage/data/leon.jpg"); imgfondo = imread("C:/Users/Sebastian/Documents/Visual Studio 2015/Projects

11 K-Means 原理及案例

阅读更多关于 11 K-Means 原理及案例

11 K-Means 原理及案例非监督学习 unsupervised learning （非监督学习），只有特征值，没有目标值聚类：主要方法 - k-means （K - 需要分成的类别数） K-Means步骤随机设置K个特征空间内的点作为初始的聚类中心（红，绿，蓝） k=3 (给定）对于其他每个点计算到K个中心的距离，未知的点选择最近的一个聚类中心点作为标记类别,形成3个族群分别计算这3个族群的平均值，把三个平均值与之前的三个旧中心进行比较。如果相同则结束聚类，如果不相同，把这三个平均点当做新的中心点，重复第二步。 Kmeans性能评估指标注：对于每个点𝑖 为已聚类数据中的样本，𝑏_𝑖 为𝑖 到其它族群的所有样本的平均距离，𝑎_𝑖 为𝑖 到本身簇的距离平均值，最终计算出所有的样本点的轮廓系数平均值。 sc_i 取值当b_i >>a_i 时，外部距离远大于内部距离，为1，完美情况。当b_i <<a_i 时，内部距离远大于外部距离，为-1，最差情况。因此取值范围为[-1,1] ,实际情况中超过0，或者0.1就已经算是不错的情况。 K-Means API sklearn.cluster.KMeans n_cluster=8 (开始的聚类中心数量） labels: 默认的标记类型（不是值），可以和真实值比较。 sklearn.metrics

数据挖掘--K-means

阅读更多关于数据挖掘--K-means

K-Means方法是MacQueen1967年提出的。给定一个数据集合X和一个整数K（n），K-Means方法是将X分成K个聚类并使得在每个聚类中所有值与该聚类中心距离的总和最小。 K-Means聚类方法分为以下几步： [1] 给K个cluster选择最初的中心点，称为K个Means。 [2] 计算每个对象和每个中心点之间的距离。 [3] 把每个对象分配给距它最近的中心点做属的cluster。 [4] 重新计算每个cluster的中心点。 [5] 重复2，3，4步，直到算法收敛。以下几张图动态展示了这几个步骤：下面，我们以一个具体的例子来说明一下K-means算法的实现。 K-means算法的优缺点：优点：（1）对于处理大数据量具有可扩充性和高效率。算法的复杂度是O（tkn），其中n是对象的个数，k是cluster的个数，t是循环的次数，通常k，t<<n。（2）可以实现局部最优化,如果要找全局最优，可以用退火算法或者遗传算法缺点：（1）Cluster的个数必须事先确定，在有些应用中，事先并不知道cluster的个数。（2）K个中心点必须事先预定，而对于有些字符属性，很难确定中心点。（3）不能处理噪音数据。（4）不能处理有些分布的数据（例如凹形） K-Means方法的变种 (1) K-Modes ：处理分类属性 (2) K-Prototypes

I have 2,000,000 points in 100 dimensionality space. How can I cluster them to K (e.g., 1000) clusters?

阅读更多关于 I have 2,000,000 points in 100 dimensionality space. How can I cluster them to K (e.g., 1000) clusters?

问题 The problem comes as follows. I have M images and extract N features for each image, and the dimensionality of each feature is L. Thus, I have M*N features (2,000,000 for my case) and each feature has L dimensionality (100 for my case). I need to cluster these M*N features into K clusters. How can I do it? Thanks. 回答1: Do you want 1000 clusters of images, or of features, or of (image, feature) pairs ? In any case, it sounds as though you'll have to reduce the data and use simpler methods. One

Show rows on clustered kmeans data

阅读更多关于 Show rows on clustered kmeans data

Hi I was wondering when you cluster data on the figure screen is there a way to show which rows the data points belong to when you scroll over them? From the picture above I was hoping there would be a way in which if I select or scroll over the points that I could tell which row it belonged to. Here is the code: %% dimensionality reduction columns = 6 [U,S,V]=svds(fulldata,columns); %% randomly select dataset rows = 1000; columns = 6; %# pick random rows indX = randperm( size(fulldata,1) ); indX = indX(1:rows); %# pick random columns indY = randperm( size(fulldata,2) ); indY = indY(1:columns)

How to vectorize json data for KMeans?

阅读更多关于 How to vectorize json data for KMeans?

I have a number of questions and choices which users are going to answer. They have the format like this: question_id, text, choices And for each user I store the answered questions and selected choice by each user as a json in mongodb: {user_id: "", "question_answers" : [{"question_id": "choice_id", ..}] } Now I'm trying to use K-Means clustering and streaming to find most similar users based on their choices of questions but I need to convert my user data to some vector numbers like the example in Spark's Docs here . kmeans data sample and my desired output: 0.0 0.0 0.0 0.1 0.1 0.1 0.2 0.2 0

R - cluster analysis on binary weblog data

阅读更多关于 R - cluster analysis on binary weblog data

I have a web data that looks similar to the sample below. It simply has the user and binary value for whether that user cliked on a particular link within a website. I wanted to do some clustering of this data. My main goal is to find similar users based on their online behaviour. What is a good clustering alorithm for this? I have tried k-means which does not work well with binary data. I have also tried spherical k-means skmeans() . I wanted to do a sum of squared error scree plot, but I could not figure out how to get SSE from skmeans. User link1 link2 link3 link4 abc1 0 1 1 1 abc2 1 0 1 0

unstable result from scipy.cluster.kmeans

阅读更多关于 unstable result from scipy.cluster.kmeans

The following code gives different results at every runtime while clustering the data into 3 parts using the k means method: from numpy import array from scipy.cluster.vq import kmeans,vq data = array([1,1,1,1,1,1,3,3,3,3,3,3,7,7,7,7,7,7]) centroids = kmeans(data,3,100) #with 100 iterations print (centroids) Three possible results obtained were: (array([1, 3, 7]), 0.0) (array([3, 7, 1]), 0.0) (array([7, 3, 1]), 0.0) Actually, the order of the calculated k means are different. But, does not it unstable to assign which k means point belongs to which cluster? Any idea?? That's because if you pass

Implementing the Elbow Method for finding the optimum number of clusters for K-Means Clustering in R [closed]

阅读更多关于 Implementing the Elbow Method for finding the optimum number of clusters for K-Means Clustering in R [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I want to use K-Means Clustering for my dataset. I am using the kmeans() function in R for doing this. k<-kmeans(data,centers=3) plotcluster(m,k$cluster) However i am not sure what is the correct value of K for this function. I want to try using the Elbow Method for this. Are there any packages in R which

订阅 k-means