k-means

What is the difference between SOM (Self Organizing Maps) and K-Means?

大城市里の小女人 提交于 2019-12-03 12:47:25
问题 There is only one question related to this in stackoverflow, and it is more about which one is better. I just dont really understand the difference. I mean they both work with vectors, which are assigned randomly to clusters, they both work with the centroids of the different clusters in order to determine the winning output node. I mean, where exactly lies the difference? 回答1: In K-means the nodes (centroids) are independent from each other. The winning node gets the chance to adapt each

k-means|k-mode|k-prototype|PAM|AGNES|DIANA|Hierarchical cluster|DA|VIF|

江枫思渺然 提交于 2019-12-03 12:21:55
聚类算法: 对于数值变量, k-means eg : k=4 ,则选出不在原数据中的 4 个点,计算图形中每个点到这四个点之间的距离,距离最近的便是属于那一类。标准化之后便没有单位差异了,就可以相互比较。 对于分类变量, k-mode : 对于数值和分类变量: k-prototype 连续变量与分类变量的权重, K=1 则等权重; K<1 则分类变量; K>1 则数值变量。 PAM :两种因素排序,坐标是( a,b ) , 若 k=2, 则在其中(通过计算原数据集某一类所有点到某一点距离最短找到该点)选出 2 个点,计算图形中每个点到这四个点之间的距离,距离最近的便是属于那一类,没有方向性。 AGNES DIANA Cluster 之间的比较 通过各种距离计算方式将变量联系在一起,成为聚类的依据。 Hierarchical cluster :将每个变量的不同因素( a,b,c,d,e,f,g) 描点成网络,网络变成矩阵(其中网络权重(距离)为矩阵处数值),矩阵变成树形图。 判别函数: 回归是连续变量 x 解释连续变量 y 方差分析是分类变量 x 解释连续变量 y 判别分析( DA )是连续变量 x 解释分类变量 y 使用 DA 的前提: 样本量是因素种类的 4-5 倍。 正态性即数据总体是正态分布。 方差齐性即各方面保持均匀。 判断独立性 VIF 膨胀系数 线性判别函数不够用时

Cosine distance as vector distance function for k-means

我只是一个虾纸丫 提交于 2019-12-03 11:52:16
I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient's value is the duration in seconds spent at the corresponding place or 0 if that place was not visited. E.g. for the graph: the vector: v1 = {100, 50, 0 30, 0} would mean that we spent: 100secs at vertex 1 50secs at vertex 2 and 30secs at vertex 4 (vertices 3 & 5 where not visited, thus the 0s). I want to run a k-means clustering and I've chosen cosine_distance = 1 - cosine_similarity as the metric for the distances, where the formula for cosine

How to Find Documents That are in the same Cluster with KMeans

谁都会走 提交于 2019-12-03 11:42:32
问题 I have clustered various articles together with the Scikit-learn framework. Below are the top 15 words in each cluster: Cluster 0: whales islands seaworld hurricane whale odile storm tropical kph mph pacific mexico orca coast cabos Cluster 1: ebola outbreak vaccine africa usaid foundation virus cdc gates disease health vaccines experimental centers obama Cluster 2: jones bobo sanford children carolina mississippi alabama lexington bodies crumpton mccarty county hyder tennessee sheriff Cluster

Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)

岁酱吖の 提交于 2019-12-03 11:14:50
I have a data table ("norm") containing numeric - at least to what I can see - normalized values of the following form: When I am executing k <- kmeans(norm,center=3) I am receving the following error: Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1) Can you help me? Thank you! kmeans cannot handle data that has NA values. The mean and variance are then no longer well defined, and you don't know anymore which center is closest. Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1) This error occurs also due to non numeric values present in the table. all of

K-means 聚类学习

对着背影说爱祢 提交于 2019-12-03 10:32:21
没有监督标签,只有x特征值,没有y,没有办法去预测,没有办法证明你做的对错,这样的数据集,我们能做的是什么呢?就是非监督机器学习。常见的算法就是聚类或者降维。聚类做的是什么?就是挖掘数据集中的规律的存在,通过把相似的数据归类,帮助我们探索数据集里的样本如何划分,比如可以将用户分群,不同的营销策略。聚类里包含的算法也是非常多。 聚类的基本思路是:物以类聚,人以群分。通过特征,计算样本之间的相似度。 K-means 聚类学习: 第一步:确定一个超参数k,k就是打算把样本聚集为几类。 第二步:在所有的样本中,随机的选择三个点,作为聚类的初始中心。 第三步:依次计算除这三个中心点以外的每一个点,和三个中心点的距离。然后找出样本点离哪个中心点最近。 第四歩:将所有点划分到离它最近的那个中心点所代表的簇中去。 第五步:所有样本会被划分k个类别,有了k堆数据,分别计算这k个簇的质心。例如: 第六步:生成k个新的聚类中心点,以这k个新的重点重新重复3-5歩。 第七歩:终止条件(一):在重复的聚类过程中,所有样本点的分类结果都不再发生变化;(二)或者达到你设定的算法最大迭代次数,例如max_iter = 200 . 原理-算法实现: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

how to set initial centers of K-means openCV c++

匿名 (未验证) 提交于 2019-12-03 10:24:21
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am trying to do a segmentation of an image using OpenCv and Kmeans, the code that I have just implemented is the following: #include "opencv2/objdetect/objdetect.hpp" #include "opencv2/highgui/highgui.hpp" #include "opencv2/imgproc/imgproc.hpp" #include <iostream> #include <stdio.h> using namespace std; using namespace cv; int main(int, char** argv) { Mat src, Imagen2, Imagris, labels, centers,imgfondo; src = imread("C:/Users/Sebastian/Documents/Visual Studio 2015/Projects/ClusteringImage/data/leon.jpg"); imgfondo = imread("C:/Users

Color quantization of an image using K-means clustering (using RGB features)

会有一股神秘感。 提交于 2019-12-03 08:56:19
Is it possible to clustering for RGB + spatial features of images with matlab? NOTE: I want to use kmeans for clustering. In fact basicly i want to do one thing, i want to get this image from this I think you are looking for color quantization. [imgQ,map]= rgb2ind(img,4,'nodither'); %change this 4 to the number of desired colors %in quantized image imshow(imgQ,map); Result: Using kmeans : %img is the original image imgVec=[reshape(img(:,:,1),[],1) reshape(img(:,:,2),[],1) reshape(img(:,:,3),[],1)]; [imgVecQ,imgVecC]=kmeans(double(imgVec),4); %4 colors imgVecQK=pdist2(imgVec,imgVecC); %choosing

dtype mismatch in sklearn on k-means

匿名 (未验证) 提交于 2019-12-03 08:46:08
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am attempting to run the first answer to this question Python Relating k-means cluster to instance however I am getting the following error: Traceback (most recent call last): File "test.py", line 16, in <module> model = sklearn.cluster.k_means(a, clust_centers) File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-i686.egg/sklearn/cluster/k_means_.py", line 267, in k_means x_squared_norms=x_squared_norms, random_state=random_state) File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-i686.egg

Plotting the boundaries of cluster zone in Python with scikit package

杀马特。学长 韩版系。学妹 提交于 2019-12-03 08:42:26
Here is my simple example of dealing with data clustering in 3 attribute(x,y,value). each sample represent its location(x,y) and its belonging variable. My code was post here: x = np.arange(100,200,1) y = np.arange(100,200,1) value = np.random.random(100*100) xx,yy = np.meshgrid(x,y) xx = xx.reshape(100*100) yy = yy.reshape(100*100) j = np.dstack((xx,yy,value))[0,:,:] fig = plt.figure(figsize =(12,4)) ax1 = plt.subplot(121) xi,yi = np.meshgrid(x,y) va = value.reshape(100,100) pc = plt.pcolormesh(xi,yi,va,cmap = plt.cm.Spectral) plt.colorbar(pc) ax2 = plt.subplot(122) y_pred = KMeans(n_clusters