k-means

Optimizing K-means algorithm

£可爱£侵袭症+ 提交于 2019-12-10 10:56:41
问题 Am trying to follow a paper called An Optimized Version of K-Means Algorithm. I have the idea on how K-Means algorithm works. That is, grouping the tuples/points into clusters and updating the centroids. Am trying to implement the method mentioned in the above paper. Their proposed algorithm is this: So my doubt is in the second step. I didn't understood what it is being done there! In the paper it says that, we group our data to wider intervals based on the value of e , so that later we

K means clustering in MATLAB - output image

て烟熏妆下的殇ゞ 提交于 2019-12-10 10:47:25
问题 To perform K means clustering with k = 3 (segments). So I: 1) Converted the RGB img into grayscale 2) Casted the original image into a n X 1, column matrix 3) idx = kmeans(column_matrix) 4) output = idx, casted back into the same dimensions as the original image. My questions are : A When I do imshow(output), I get a plain white image. However when I do imshow(output[0 5]), it shows the output image. I understand that 0 and 5 specify the display range. But why do I have to do this? B) Now the

Getting an IOException when running a sample code in “Mahout in Action” on mahout-0.6

≯℡__Kan透↙ 提交于 2019-12-10 04:26:34
问题 I'm learning Mahout and reading "Mahout in Action". When I tried to run the sample code in chapter7 SimpleKMeansClustering.java, an exception popped up: Exception in thread "main" java.io.IOException: wrong value class: 0.0: null is not class org.apache.mahout.clustering.WeightedPropertyVectorWritable at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1874) at SimpleKMeansClustering.main(SimpleKMeansClustering.java:95) I successed this code on mahout-0.5, but on mahout-0.6 I

K means finding elbow when the elbow plot is a smooth curve

末鹿安然 提交于 2019-12-10 04:13:23
问题 I am trying to plot the elbow of k means using the below code: load CSDmat %mydata for k = 2:20 opts = statset('MaxIter', 500, 'Display', 'off'); [IDX1,C1,sumd1,D1] = kmeans(CSDmat,k,'Replicates',5,'options',opts,'distance','correlation');% kmeans matlab [yy,ii] = min(D1'); %% assign points to nearest center distort = 0; distort_across = 0; clear clusts; for nn=1:k I = find(ii==nn); %% indices of points in cluster nn J = find(ii~=nn); %% indices of points not in cluster nn clusts{nn} = I; %%

OpenCV K-Means (kmeans2)

夙愿已清 提交于 2019-12-10 03:29:01
问题 I'm using Opencv's K-means implementation to cluster a large set of 8-dimensional vectors. They cluster fine, but I can't find any way to see the prototypes created by the clustering process. Is this even possible? OpenCV only seems to give access to the cluster indexes (or labels). If not I guess it'll be time to make my own implementation! 回答1: I can't say I used OpenCV's implementation of Kmeans, but if you have access to the labels given to each instance, you can simply get the centroids

k means clustering algorithm

白昼怎懂夜的黑 提交于 2019-12-09 13:51:13
问题 I want to perform a k means clustering analysis on a set of 10 data points that each have an array of 4 numeric values associated with them. I'm using the Pearson correlation coefficient as the distance metric. I did the first two steps of the k means clustering algorithm which were: 1) Select a set of initial centres of k clusters. [I selected two initial centres at random] 2) Assign each object to the cluster with the closest centre. [I used the Pearson correlation coefficient as the

Use Absolute Pearson Correlation as Distance in K-Means Algorithm (MATLAB)

江枫思渺然 提交于 2019-12-09 13:44:30
问题 I need to do some clustering using a correlation distance but instead of using the built-in 'distance' 'correlation' which is defined as d=1-r i need the absolute pearson distance.In my aplication anti-correlated data should get the same cluter ID. And now when using the kmeans() function im getting centroids that are highly anticorreleted wich i would like to avoid by combineing them. Now, im not that fluent in matlab yet and have some problems reading the kmeans function. Would it be

How to set k-Means clustering labels from highest to lowest with Python?

匆匆过客 提交于 2019-12-09 09:49:52
问题 I have a dataset of 38 apartments and their electricity consumption in the morning, afternoon and evening. I am trying to clusterize this dataset using the k-Means implementation from scikit-learn, and am getting some interesting results. First clustering results: This is all very well, and with 4 clusters I obviously get 4 labels associated to each apartment - 0, 1, 2 and 3. Using the random_state parameter of KMeans method, I can fix the seed in which the centroids are randomly initialized,

Plotting the boundaries of cluster zone in Python with scikit package

江枫思渺然 提交于 2019-12-09 05:51:03
问题 Here is my simple example of dealing with data clustering in 3 attribute(x,y,value). each sample represent its location(x,y) and its belonging variable. My code was post here: x = np.arange(100,200,1) y = np.arange(100,200,1) value = np.random.random(100*100) xx,yy = np.meshgrid(x,y) xx = xx.reshape(100*100) yy = yy.reshape(100*100) j = np.dstack((xx,yy,value))[0,:,:] fig = plt.figure(figsize =(12,4)) ax1 = plt.subplot(121) xi,yi = np.meshgrid(x,y) va = value.reshape(100,100) pc = plt

What's the difference between kmeans and kmeans2 in scipy?

 ̄綄美尐妖づ 提交于 2019-12-08 17:12:04
问题 I am new to machine learning and wondering the difference between kmeans and kmeans2 in scipy. According to the doc both of them are using the 'k-means' algorithm, but how to choose them? 回答1: Based on the documentation, it seems kmeans2 is the standard k-means algorithm and runs until converging to a local optimum - and allows you to change the seed initialization. The kmeans function will terminate early based on a lack of change, so it may not even reach a local optimum. Further, the goal