k-means

OpenCV using k-means to posterize an image

倖福魔咒の 提交于 2019-12-03 04:23:34
问题 I want to posterize an image with k-means and OpenCV in C++ interface (cv namespace) and I get weird results. I need it for reduce some noise. This is my code: #include "cv.h" #include "highgui.h" using namespace cv; int main() { Mat imageBGR, imageHSV, planeH, planeS, planeV; imageBGR = imread("fruits.jpg"); imshow("original", imageBGR); cv::Mat labels, data; cv::Mat centers(8, 1, CV_32FC1); imageBGR.convertTo(data, CV_32F); cv::kmeans(data, 8, labels, cv::TermCriteria(CV_TERMCRIT_ITER, 10,

How to add k-means predicted clusters in a column to a dataframe in Python

一个人想着一个人 提交于 2019-12-03 03:29:27
Have a question about kmeans clustering in python. So I did the analysis that way: from sklearn.cluster import KMeans km = KMeans(n_clusters=12, random_state=1) new = data._get_numeric_data().dropna(axis=1) kmeans.fit(new) predict=km.predict(new) How can I add the column with cluster results to my first dataframe "data" as an additional column? Thanks! Assuming the column length is as the same as each column in you dataframe df , all you need to do is this: df['NEW_COLUMN'] = Series(predict, index=df.index) 来源: https://stackoverflow.com/questions/38372188/how-to-add-k-means-predicted-clusters

Outlier detection with k-means algorithm

匿名 (未验证) 提交于 2019-12-03 02:56:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I am hoping you can help me with my problem. I am trying to detect outliers with use of the kmeans algorithm. First I perform the algorithm and choose those objects as possible outliers which have a big distance to their cluster center. Instead of using the absolute distance I want to use the relative distance, i.e. the ration of absolute distance of the object to the cluster center and the average distance of all objects of the cluster to their cluster center. The code for outlier detection based on absolute distance is the

What is the difference between SOM (Self Organizing Maps) and K-Means?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-03 02:14:13
There is only one question related to this in stackoverflow, and it is more about which one is better. I just dont really understand the difference. I mean they both work with vectors, which are assigned randomly to clusters, they both work with the centroids of the different clusters in order to determine the winning output node. I mean, where exactly lies the difference? In K-means the nodes (centroids) are independent from each other. The winning node gets the chance to adapt each self and only that. In SOM the nodes (centroids) are placed onto a grid and so each node is consider to have

How to Find Documents That are in the same Cluster with KMeans

淺唱寂寞╮ 提交于 2019-12-03 02:09:49
I have clustered various articles together with the Scikit-learn framework. Below are the top 15 words in each cluster: Cluster 0: whales islands seaworld hurricane whale odile storm tropical kph mph pacific mexico orca coast cabos Cluster 1: ebola outbreak vaccine africa usaid foundation virus cdc gates disease health vaccines experimental centers obama Cluster 2: jones bobo sanford children carolina mississippi alabama lexington bodies crumpton mccarty county hyder tennessee sheriff Cluster 3: isis obama iraq syria president isil airstrikes islamic li strategy terror military war threat al

How to visualize k-means centroids for each iteration?

匿名 (未验证) 提交于 2019-12-03 01:38:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I would like to graphically demostrate the behavior of k-means by plotting iterations of the algorithm from a starting value (at (3,5),(6,2),(8,3)) of initial cluster till the cluster centers. Each iteration may correspond to a single plot with centroids and clusters. Given: x<-c(3,6,8,1,2,2,6,6,7,7,8,8) y<-c(5,2,3,5,4,6,1,8,3,6,1,7) df<-data.frame(x,y) dfCluster<-kmeans(df,centers=3) # with 3 centroids I would like to use the first three tuples as my initial cluster and track the movement of the centroids. 回答1: Try to use tryCatch to

k-means clustering in R on very large, sparse matrix?

匿名 (未验证) 提交于 2019-12-03 01:23:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am trying to do some k-means clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). The whole thing does not fit into memory, so I converted it into a sparse ARFF file. But R obviously can't read the sparse ARFF file format. I also have the data as a plain CSV file. Is there any package available in R for loading such sparse matrices efficiently? I'd then use the regular k-means algorithm from the cluster package to proceed. Many thanks 回答1: The

Matlab:K-means clustering

匿名 (未验证) 提交于 2019-12-03 01:05:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have a matrice of A(369x10) which I want to cluster in 19 clusters. I use this method [idx ctrs]=kmeans(A,19) which yields idx(369x1) and ctrs(19x10) I get the point up to here.All my rows in A is clustered in 19 clusters. Now I have an array B(49x10).I want to know where the rows of this B corresponds in the among given 19 clusters. How is it possible in MATLAB? Thank you in advance 回答1: I can't think of a better way to do it than what you described. A built-in function would save one line, but I couldn't find one. Here's the code I would

More questions on “optimizing K-means algorithm”

匿名 (未验证) 提交于 2019-12-03 00:44:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I want to implement a paper with "An Optimized Version of the K-Means Clustering Algorithm" title. This paper is in this link : https://fedcsis.org/proceedings/2014/pliks/258.pdf . This paper is not obvious. I see in stackoverflow that @Vpp Man was ask some question about that ( Optimizing K-means algorithm ) but because i have extra question about that, i create new question page. My questions: 1) Is algorithm2 full of algorithm or I must put it in part of algorithm1 (in step2 of algorithm1)? 2) In step 2 of algorithm2: What is mean of 'i'

机器学习十大算法之K-means

匿名 (未验证) 提交于 2019-12-03 00:41:02
K-means算法,也称为K平均或K均值算法; K平均聚类的目的是:把n个点(可以是样本的一次观察或一个实例)划分到k个聚类中,使得每个点都属于离他最近中心点的距离最近(或者说相似度上更相近的)对应的聚类。 1.从定义可以看出Kmeans主要是通过K中心和对K中心的距离计算进行聚类;所以K-means主要问题是 K值选取和距离(相似度衡量)使用 2.由于每次都要计算所有的样本与每一个质心之间的距离(相似度),故在大规模的数据集上,K-Means算法的收敛速度比较慢。 1.选择聚类的个数k(kmeans算法传递超参数的时候,只需设置最大的K值) 2.任意产生k个聚类,然后确定聚类中心,或者直接生成k个中心。 3.对每个点确定其聚类中心点。 4.再计算其聚类新中心。 5.重复以上步骤直到满足收敛要求。(通常就是确定的中心点不再改变,或者损失函数达到预期范围) 举例如下样本,通过Kmeans对其分为两类: 样本 Xֵ Yֵ p1 7 7 p2 2 3 p3 6 8 p4 1 4 p5 1 2 p6 3 1 p7 8 8 p8 9 10 p9 10 7 p10 5 5 p11 7 6 p12 9 3 p13 2 8 p14 5 11 p15 5 2 数据点分布在坐标轴上的图像如下: 1.使用K-means中分两类K=2; 2.选择p1,p2为初始的两个中心点 3