k-means | 易学教程

OpenCV's clustering function cvKMeans2() - why doesnt work when i use the centers parameter

阅读更多关于 OpenCV's clustering function cvKMeans2() - why doesnt work when i use the centers parameter

问题 i use this code. its should print the clusters labels an then the centroids. but the 'center"matrix with the centriod seems to be empty,full of zeros. what is wrong my friends? #include <iostream> #include <stdio.h> #include "cxcore.h" #include "highgui.h" using namespace cv; int main( int argc, char** argv ) { int i,j; CvMat* points = cvCreateMat( 5, 2, CV_32FC1 ); CvMat* centers2 = cvCreateMat( 5, 2, CV_32FC1 ); CvMat* clusters = cvCreateMat( 5, 1, CV_32SC1 ); cvSetReal2D( points, 0, 0,1);

Bag of Visual Words: what is a reasonable word (vector) dimension?

阅读更多关于 Bag of Visual Words: what is a reasonable word (vector) dimension?

问题 In the Bag of Features/Visual Words paradigm we have a vector V in k -dimensions, where V[i]=j if the i -th centroid (obtained by k -means algorithm) is the closest one among all the k -centroids for j visual descriptors (e.g. SIFT descriptors). AFAIK, the resulting visual vector is very sparse (it means that most of entries are 0-value) since k is really big, but my question is: what is a reasonable value for k (and so the vector size)? Hundreds of dimensions? Thousands? Especially

Understanding output from kmeans clustering in python

阅读更多关于 Understanding output from kmeans clustering in python

问题 I have two distance matrices, each 232*232 where the column and row labels are identical. So this would be an abridged version of the two where A, B, C and D are the names of the points between which the distances are measured: A B C D ... A B C D ... A 0 1 5 3 A 0 5 3 9 B 4 0 4 1 B 2 0 7 8 C 2 6 0 3 C 2 6 0 1 D 2 7 1 0 D 5 2 5 0 ... ... The two matrices therefore represent the distances between pairs of points in two different networks. I want to identify clusters of pairs that are close

OpenCV kmeans: N>=K exception, error (-215)

阅读更多关于 OpenCV kmeans: N>=K exception, error (-215)

问题 when I try to use kmeans as such: int K = 4; Mat labels; Mat centers; std::vector<float> values; // (put a bunch of values into "values" here...) kmeans(values, K, labels, TermCriteria(TermCriteria::COUNT + TermCriteria::EPS, 10, 1.0), 10, KMEANS_PP_CENTERS, centers); I get the error: "error: (-215) N >= K in function kmeans" values.size() = 360000, so N is clearly greater than K. What gives? Thanks. 回答1: OpenCV weirdly interprets one-dimensional data as a 1 element array. Something like

Displaying kmean result with specific colors to specific clusters

阅读更多关于 Displaying kmean result with specific colors to specific clusters

问题 I applied k-mean clustering on a preprocessed image using the following matlab code %B - input image C=rgb2gray(B); [idx centroids]=kmeans(double(C(:)),4); imseg = zeros(size(C,1),size(C,2)); for i=1:max(idx) imseg(idx==i)=i; end i=mat2gray(imseg); % i - output image Every time I display the output, color assigned to the output images changes. How can I give a specific color to cluster1, cluster2, cluster3 and cluster4. 回答1: You can use a colormap. Let R1 , B1 and G1 be the RGB values you

Sklearn MiniBatchKMeans gives confusing results for labels_ attribute

阅读更多关于 Sklearn MiniBatchKMeans gives confusing results for labels_ attribute

问题 I am using sklearn.cluster.MiniBatchKMeans for training an ML model. I need to get cluster ids of clusters and I tried with the below code. (Here model is the MiniBatchKmeans Clustering model) print("Cluster IDs: ", np.unique(model.labels_)) print("Number of Clusters: ", model.n_clusters) I got the following result. Cluster IDs: [0] Number of Clusters: 2 According to this result, it shows that there is only 1 cluster-id for the given dataset and still there are 2 clusters. I found that all

Convert Array[DenseVector] to CSV with Scala

阅读更多关于 Convert Array[DenseVector] to CSV with Scala

问题 I am using Kmeans Spark function with Scala and I need to save the Cluster Centers obtained into a CSV. This val is type: Array[DenseVector] . val clusters = KMeans.train(parsedData, numClusters, numIterations) val centers = clusters.clusterCenters I was trying converting centers to a RDD file and then from RDD to DF, but I get a lot of problems (e.g, import spark.implicits._ / SQLContext.implicits._ is not working and I cannot use .toDF ). I was wondering if there is another way to make a

Is sklearn.cluster.KMeans sensative to data point order?

阅读更多关于 Is sklearn.cluster.KMeans sensative to data point order?

问题 As noted in the answer to this post about feature scaling, some(all?) implementations of KMeans are sensitive to the order of features data points. Based on the sklearn.cluster.KMeans documentation, n_init only changes the initial position of the centroid. This would mean that one must loop over a few shuffles of features data points to test if this is a problem. My questions are as follows: Is the scikit-learn implementation sensitive to the ordering as the post suggest? Does n_init take

How to save cluster assignments in output file using Weka clustering XMeans?

阅读更多关于 How to save cluster assignments in output file using Weka clustering XMeans?

问题 Context I want to use Weka clustering algorithm XMeans . However I cannot figure out how to obtain cluster assignments from GUI of Weka . At the moment I can only see a list of cluster IDs along with percentage of entries assigned to each cluster. Question There any way to save cluster assignments for each entry in, e.g. CSV format? 回答1: Do everything in the "Preprocess Panel". This is one way to do this: Load Data File. Remove any Classification Attribute or Identifiers Choose Preprocess /

Creation prediction function for kmean in R

阅读更多关于 Creation prediction function for kmean in R

问题 I want create predict function which predicts for which cluster, observation belong data(iris) mydata=iris m=mydata[1:4] train=head(m,100) xNew=head(m,10) rownames(train)<-1:nrow(train) norm_eucl=function(train) train/apply(train,1,function(x)sum(x^2)^.5) m_norm=norm_eucl(train) result=kmeans(m_norm,3,30) predict.kmean <- function(cluster, newdata) { simMat <- m_norm(rbind(cluster, newdata), sel=(1:nrow(newdata)) + nrow(cluster))[1:nrow(cluster), ] unname(apply(simMat, 2, which.max)) } ##