k-means

Cannot handle any class attribute! kmeans java

跟風遠走 提交于 2019-12-12 10:24:41
问题 I want execute a k-means algorithm i use for this weka in eclipse i have this code public class demo { public demo() throws Exception { // TODO Auto-generated constructor stub BufferedReader breader = null; breader = new BufferedReader(new FileReader( "D:/logiciels/weka-3-7-12/weka-3-7-12/data/iris.arff")); Instances Train = new Instances(breader); Train.setClassIndex(Train.numAttributes() - 1); SimpleKMeans kMeans = new SimpleKMeans(); kMeans.setSeed(10); kMeans.setPreserveInstancesOrder

compute clustersize automatically for kmeans

独自空忆成欢 提交于 2019-12-12 10:19:35
问题 I am using scikit-learn and experimenting Kmeans. Its fast but requires number of clusters as an argument. What i would like to try is to automatically computer number of clusters for based on population of documents. hash-based near-neighbor algorithms (ssdeep) i used before can get similarity clusters based on distance , how can i get cluster size automatically for k means . KMeans(init='k-means++', n_clusters=cluster_count, n_init=10), name="k-means++", data=data) I want to calculate that

opencv multidimensional kmeans

走远了吗. 提交于 2019-12-12 05:39:14
问题 I'm trying to run the kmeans algorithm on a n-dimensional data. I Have N points and each point have { x, y, z, ... , n } features. my code is the following: cv::Mat points(N, n, CV_32F); // fill the data points cv::Mat labels; cv::Mat centers; cv::kmeans(points, k, labels, cv::TermCriteria(CV_TERMCRIT_ITER|CV_TERMCRIT_EPS, 1000, 0.001), 10, cv::KMEANS_PP_CENTERS, centers); the problem is that the kmeans algorithm run into a segmentation fault. any help is appreciated update How Miki and Micka

Find Jaccard distance of tweets and cluster in Kmeans

旧巷老猫 提交于 2019-12-12 04:11:17
问题 This is a follow up question to a problem I've been working on for a while. I have two questions. One regards an algorithm that works on two tweets, that I revised to measure 10 tweets. I'm wondering what my revision is measuring. I get result, but I want it to measure several tweet's jaccard distances, not just return one value. Since it's returning one value, I think it's just adding everything up. The other question is about my attempt to create a For Loop and assign clusters. I'm trying

Drawbacks of K-Medoid (PAM) Algorithm

社会主义新天地 提交于 2019-12-12 03:55:53
问题 I have researched that K-medoid Algorithm (PAM) is a parition-based clustering algorithm and a variant of K-means algorithm. It has solved the problems of K-means like producing empty clusters and the sensitivity to outliers/noise. However, the time complexity of K-medoid is O(n^2), unlike K-means (Lloyd's Algorithm) which has a time complexity of O(n). I would like to ask if there are other drawbacks of K-medoid algorithm aside from its time complexity. 回答1: The main disadvantage of K-Medoid

k-means for text clustering

风流意气都作罢 提交于 2019-12-12 02:43:09
问题 I'm trying to implement k-means for text clustering, specifically English sentences. So far I'm at the point where I have a term frequency matrix for each document (sentence). I'm a little confused on the actual implementation of k-means on text data. Here's my guess of how it should work. Figure out the number of unique words in all sentences (a large number, call it n ). Create k n dimensional vectors (clusters) and fill in the values of the k vectors with some random numbers (how do I

How to replace the appropriate colors with my own pallette in MATLAB?

送分小仙女□ 提交于 2019-12-12 02:24:24
问题 I am using MATLAB 2015. I want to reduce the image color count. An RGB image will be segmentated using k-means algorithm. Then mean colors will be replaced with the colors I have. The colors are (10), black - [255, 255, 255], yellow - [255, 255, 0], orange - [255, 128, 0], white - [255, 255, 255], pink - [255, 153, 255], lavender - [120, 102, 255], brown - [153, 51, 0], green - [0, 255, 0], blue - [0, 0, 255], red - [255, 0, 0]. I have succeeded clustering the image. Clustered images should

Weighting k Means Clustering by number of observations

你说的曾经没有我的故事 提交于 2019-12-12 01:55:26
问题 I would like to cluster some data using k Means in R that looks as follows. ADP NS CNTR PP2V EML PP1V ADDPS FB PP1D ADR ISV PP2D ADSEM SUMALL CONV 2 0 0 1 0 0 0 0 0 12 0 12 0 53 0 2 0 0 1 0 0 0 0 0 14 0 25 0 53 0 2 0 0 1 0 0 0 0 0 15 0 0 0 53 0 2 0 0 1 0 0 0 0 0 15 0 4 0 53 0 2 0 0 1 0 0 0 0 0 17 0 0 0 53 0 2 0 0 1 0 0 0 0 0 18 0 0 0 106 0 2 0 0 1 0 0 0 0 0 23 0 10 0 53 0 2 0 0 1 0 0 1 0 0 0 0 1 0 106 0 2 0 0 1 0 0 3 0 0 0 0 0 0 53 0 2 0 0 2 0 0 0 0 0 0 0 0 0 3922 0 2 0 0 2 0 0 0 0 0 0 0 1 0

how to import logistic regression and kmeans pmml files into r

拥有回忆 提交于 2019-12-12 01:54:32
问题 I am looking for some guidance please on importing pmml model files into r. PMML is a predictive model markup language which allows models built in one system to be deployed in another. I have several models that have been trained on spss and saved to the xml format using pmml. They are Logistic Regression and k-means models. I have undertaken exhaustive searches for r capabilities to import pmml and am finding that there is only a rare function here and there in packages such as Arules for

matlab k-means clustering evaluation [duplicate]

送分小仙女□ 提交于 2019-12-12 01:53:25
问题 This question already has answers here : Evaluating K-means accuracy (2 answers) Closed last year . How effectively evaluate the performance of the standard matlab k-means implementation. For example I have a matrix X X = [1 2; 3 4; 2 5; 83 76; 97 89] For every point I have a gold standard clustering. Let's assume that (83,76), (97,89) is the first cluster and (1,2), (3,4), (2,5) is the second cluster. Then we run matlab idx = kmeans(X,2) And get the following results idx = [1; 1; 2; 2; 2]