k-means

Scipy Kmeans exits with TypeError

一曲冷凌霜 提交于 2020-01-15 01:50:25
问题 When running the code below, I'm getting a TypeError that says: "File "_vq.pyx", line 342, in scipy.cluster._vq.update_cluster_means TypeError: type other than float or double not supported" from PIL import Image import scipy, scipy.misc, scipy.cluster NUM_CLUSTERS = 5 im = Image.open('d:/temp/test.jpg') ar = scipy.misc.fromimage(im) shape = ar.shape ar = ar.reshape(scipy.product(shape[:2]), shape[2]) codes, dist = scipy.cluster.vq.kmeans(ar, NUM_CLUSTERS) vecs, dist = scipy.cluster.vq.vq(ar,

机器学习~K-Means

余生颓废 提交于 2020-01-13 05:13:05
文章目录 概述 原理 示例 Sklearn实现 聚类效果的评估 KMeans存在的几个问题 初始重心选择 K值选择 基于密度聚类(DBSCAN) mini batch kmeans 参考 概述 k-means算法是一种聚类算法,所谓聚类,是指在数据中发现数据对象之间的关系,将数据进行分组,组内的相似性越大,组间的差别越大,则聚类效果越好。 聚类算法与分类算法不同,聚类算法属于无监督学习,通俗来讲:分类就是向事物分配标签,聚类就是将相似的事物放在一起。 聚类算法通常用来寻找相似的事物,比如:银行寻找优质客户,信用卡诈骗,社交划分社区圈等等。 原理 首先K-means中的K类似与KNN中的参数K,是指将数据聚类成K个类别。 算法原理: 先从没有标签的元素集合A中随机取K个元素,作为K个子集各自的重心。 分别计算剩下的元素到K个子集重心的距离,根据距离将这些元素分别划归到最近的子集。(这里的距离可以使用欧式距离或其他的距离量度) 根据聚类结果,重新计算重心(即子集中所有元素各个维度的算数平均数) 将集合A中全部元素按照新的中心然后再重新聚类 重复第4步,直到聚类结果不再发生变化。 示例 看着算法的步骤有点懵逼,我们来看个简单的例子。 1.假设画布上有四个点,如下: 我们想将其聚类成两类,首先我们先随机选取两个点,比如A,B两点选取两个类别的重心点,然后分别计算所有元素到这两个重心的距离

Sklearn : Mean Distance from Centroid of each cluster

隐身守侯 提交于 2020-01-11 01:44:08
问题 How can i find the mean distance from the centroid to all the data points in each cluster. I am able to find the euclidean distance of each point (in my dataset) from the centroid of each cluster. Now i want to find the mean distance from centroid to all the data points in each cluster. What is a good way of calculating mean distance from each centroid ? So far I have done this.. def k_means(self): data = pd.read_csv('hdl_gps_APPLE_20111220_130416.csv', delimiter=',') combined_data = data

Scikit-learn: How to run KMeans on a one-dimensional array?

佐手、 提交于 2020-01-09 19:08:20
问题 I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeans to only this vector to find the different clusters in which the values are grouped. However, it seems KMeans works with a multidimensional array and not with one-dimensional ones. I guess there is a trick to make it work but I don't know how. I saw that KMeans.fit() accepts "X : array-like or sparse matrix, shape=(n_samples, n_features)" , but it wants the n_samples to be bigger than one I

Scikit-learn: How to run KMeans on a one-dimensional array?

[亡魂溺海] 提交于 2020-01-09 19:07:16
问题 I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeans to only this vector to find the different clusters in which the values are grouped. However, it seems KMeans works with a multidimensional array and not with one-dimensional ones. I guess there is a trick to make it work but I don't know how. I saw that KMeans.fit() accepts "X : array-like or sparse matrix, shape=(n_samples, n_features)" , but it wants the n_samples to be bigger than one I

Scikit-learn: How to run KMeans on a one-dimensional array?

纵然是瞬间 提交于 2020-01-09 19:06:50
问题 I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeans to only this vector to find the different clusters in which the values are grouped. However, it seems KMeans works with a multidimensional array and not with one-dimensional ones. I guess there is a trick to make it work but I don't know how. I saw that KMeans.fit() accepts "X : array-like or sparse matrix, shape=(n_samples, n_features)" , but it wants the n_samples to be bigger than one I

Scikit-learn: How to run KMeans on a one-dimensional array?

不想你离开。 提交于 2020-01-09 19:06:44
问题 I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeans to only this vector to find the different clusters in which the values are grouped. However, it seems KMeans works with a multidimensional array and not with one-dimensional ones. I guess there is a trick to make it work but I don't know how. I saw that KMeans.fit() accepts "X : array-like or sparse matrix, shape=(n_samples, n_features)" , but it wants the n_samples to be bigger than one I

Changing K mean clustering distance metric to canberra distance or any other distance metric on python

青春壹個敷衍的年華 提交于 2020-01-07 08:07:08
问题 How do I change the distance metric of k mean clustering to canberra distance or any other distance metric? From my understanding, sklearn only supports euclidean distance and nltk doesn't seem to support canberra distance but I may be wrong. Thank you! 回答1: from scipy.spatial import distance from nltk.cluster.kmeans import KMeansClusterer obj = KMeansCluster(num_cluster, distance = distance.canberra) 来源: https://stackoverflow.com/questions/59554641/changing-k-mean-clustering-distance-metric

MiniBatchKMeans gives different centroids after subsequent iterations

巧了我就是萌 提交于 2020-01-07 02:54:53
问题 I am using the MiniBatchKMeans model from the sklearn.cluster module in anaconda. I am clustering a data-set that contains approximately 75,000 points. It looks something like this: data = np.array([8,3,1,17,5,21,1,7,1,26,323,16,2334,4,2,67,30,2936,2,16,12,28,1,4,190...]) I fit the data using the process below. from sklearn.cluster import MiniBatchKMeans kmeans = MiniBatchKMeans(batch_size=100) kmeans.fit(data.reshape(-1,1) This is all well and okay, and I proceed to find the centroids of the

How to accurately order the clusters based on color in matlab [duplicate]

爷,独闯天下 提交于 2020-01-07 02:18:14
问题 This question already exists : How to accurately classify leafs into its disease category using Matlab Closed 3 years ago . I have an image of leaf that has mostly three colors black background, green leaf and brown diseased spots. Here is the image When I cluster it first time, I get brown spots in cluster 1, green portion in cluster 2, black region in cluster 3(for example). When I cluster it second time, I get green portion in cluster 1,brown spots in cluster 2, black region in cluster 3