k-means | 易学教程

Scipy Kmeans exits with TypeError

阅读更多关于 Scipy Kmeans exits with TypeError

问题 When running the code below, I'm getting a TypeError that says: "File "_vq.pyx", line 342, in scipy.cluster._vq.update_cluster_means TypeError: type other than float or double not supported" from PIL import Image import scipy, scipy.misc, scipy.cluster NUM_CLUSTERS = 5 im = Image.open('d:/temp/test.jpg') ar = scipy.misc.fromimage(im) shape = ar.shape ar = ar.reshape(scipy.product(shape[:2]), shape[2]) codes, dist = scipy.cluster.vq.kmeans(ar, NUM_CLUSTERS) vecs, dist = scipy.cluster.vq.vq(ar,

机器学习~K-Means

阅读更多关于机器学习~K-Means

文章目录概述原理示例 Sklearn实现聚类效果的评估 KMeans存在的几个问题初始重心选择 K值选择基于密度聚类（DBSCAN） mini batch kmeans 参考概述 k-means算法是一种聚类算法，所谓聚类，是指在数据中发现数据对象之间的关系，将数据进行分组，组内的相似性越大，组间的差别越大，则聚类效果越好。聚类算法与分类算法不同，聚类算法属于无监督学习，通俗来讲：分类就是向事物分配标签，聚类就是将相似的事物放在一起。聚类算法通常用来寻找相似的事物，比如：银行寻找优质客户，信用卡诈骗，社交划分社区圈等等。原理首先K-means中的K类似与KNN中的参数K，是指将数据聚类成K个类别。算法原理：先从没有标签的元素集合A中随机取K个元素，作为K个子集各自的重心。分别计算剩下的元素到K个子集重心的距离，根据距离将这些元素分别划归到最近的子集。（这里的距离可以使用欧式距离或其他的距离量度）根据聚类结果，重新计算重心（即子集中所有元素各个维度的算数平均数）将集合A中全部元素按照新的中心然后再重新聚类重复第4步，直到聚类结果不再发生变化。示例看着算法的步骤有点懵逼，我们来看个简单的例子。 1.假设画布上有四个点，如下：我们想将其聚类成两类，首先我们先随机选取两个点，比如A,B两点选取两个类别的重心点，然后分别计算所有元素到这两个重心的距离

Sklearn : Mean Distance from Centroid of each cluster

阅读更多关于 Sklearn : Mean Distance from Centroid of each cluster

问题 How can i find the mean distance from the centroid to all the data points in each cluster. I am able to find the euclidean distance of each point (in my dataset) from the centroid of each cluster. Now i want to find the mean distance from centroid to all the data points in each cluster. What is a good way of calculating mean distance from each centroid ? So far I have done this.. def k_means(self): data = pd.read_csv('hdl_gps_APPLE_20111220_130416.csv', delimiter=',') combined_data = data

Scikit-learn: How to run KMeans on a one-dimensional array?

阅读更多关于 Scikit-learn: How to run KMeans on a one-dimensional array?

问题 I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeans to only this vector to find the different clusters in which the values are grouped. However, it seems KMeans works with a multidimensional array and not with one-dimensional ones. I guess there is a trick to make it work but I don't know how. I saw that KMeans.fit() accepts "X : array-like or sparse matrix, shape=(n_samples, n_features)" , but it wants the n_samples to be bigger than one I

Scikit-learn: How to run KMeans on a one-dimensional array?

阅读更多关于 Scikit-learn: How to run KMeans on a one-dimensional array?

Scikit-learn: How to run KMeans on a one-dimensional array?

阅读更多关于 Scikit-learn: How to run KMeans on a one-dimensional array?

Scikit-learn: How to run KMeans on a one-dimensional array?

阅读更多关于 Scikit-learn: How to run KMeans on a one-dimensional array?

Changing K mean clustering distance metric to canberra distance or any other distance metric on python

阅读更多关于 Changing K mean clustering distance metric to canberra distance or any other distance metric on python

问题 How do I change the distance metric of k mean clustering to canberra distance or any other distance metric? From my understanding, sklearn only supports euclidean distance and nltk doesn't seem to support canberra distance but I may be wrong. Thank you! 回答1: from scipy.spatial import distance from nltk.cluster.kmeans import KMeansClusterer obj = KMeansCluster(num_cluster, distance = distance.canberra) 来源： https://stackoverflow.com/questions/59554641/changing-k-mean-clustering-distance-metric

MiniBatchKMeans gives different centroids after subsequent iterations

阅读更多关于 MiniBatchKMeans gives different centroids after subsequent iterations

问题 I am using the MiniBatchKMeans model from the sklearn.cluster module in anaconda. I am clustering a data-set that contains approximately 75,000 points. It looks something like this: data = np.array([8,3,1,17,5,21,1,7,1,26,323,16,2334,4,2,67,30,2936,2,16,12,28,1,4,190...]) I fit the data using the process below. from sklearn.cluster import MiniBatchKMeans kmeans = MiniBatchKMeans(batch_size=100) kmeans.fit(data.reshape(-1,1) This is all well and okay, and I proceed to find the centroids of the

How to accurately order the clusters based on color in matlab [duplicate]

阅读更多关于 How to accurately order the clusters based on color in matlab [duplicate]

问题 This question already exists : How to accurately classify leafs into its disease category using Matlab Closed 3 years ago . I have an image of leaf that has mostly three colors black background, green leaf and brown diseased spots. Here is the image When I cluster it first time, I get brown spots in cluster 1, green portion in cluster 2, black region in cluster 3(for example). When I cluster it second time, I get green portion in cluster 1,brown spots in cluster 2, black region in cluster 3