k-means

Getting the coordinates of every observation at each iteration of kmeans in R

大兔子大兔子 提交于 2019-12-08 13:14:43
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 5 years ago . I would like to construct an animation of the kmeans clustering algorithm in R. The animation would show each of the observations (rows) in the the dataset plotted in 2 (or 3) dimensions and then have them move into their clusters as each iteration ticks by. For this I would need to access the coordinates of the observations at each iteration. Where in the kmeans package can I

Value at KMeans.cluster_centers_ in sklearn KMeans

梦想与她 提交于 2019-12-08 10:49:04
问题 On doing K means fit on some vectors with 3 clusters, I was able to get the labels for the input data. KMeans.cluster_centers_ returns the coordinates of the centers and so shouldn't there be some vector corresponding to that? How can I find the value at the centroid of these clusters? 回答1: closest, _ = pairwise_distances_argmin_min(KMeans.cluster_centers_, X) The array closest will contain the index of the point in X that is closest to each centroid. Let's say the closest gave output as

Random Clustering Algorithm

时间秒杀一切 提交于 2019-12-08 10:42:33
问题 I have set of points, and i want clusters out of them. I know how to do normal k-means algorithm. But i don't want to take 'k' as input. Suppose if i have points like 1,3,4,50,60,70,1000,10002,10004 the algorithm should cluster them into 3 clusters C1: 1,3,4 C2: 50,60,70 C3: 1000,1002,1004 satisfying distance between intracluster elements should be minimum, and distance between intercluster should be maximum. 回答1: See how-do-i-determine-k-when-using-k-means-clustering and the links there. 回答2

How to visualize k-means centroids for each iteration?

夙愿已清 提交于 2019-12-08 09:54:39
问题 I would like to graphically demostrate the behavior of k-means by plotting iterations of the algorithm from a starting value (at (3,5),(6,2),(8,3)) of initial cluster till the cluster centers. Each iteration may correspond to a single plot with centroids and clusters. Given: x<-c(3,6,8,1,2,2,6,6,7,7,8,8) y<-c(5,2,3,5,4,6,1,8,3,6,1,7) df<-data.frame(x,y) dfCluster<-kmeans(df,centers=3) # with 3 centroids I would like to use the first three tuples as my initial cluster and track the movement of

Exporting result from kml package in R

孤人 提交于 2019-12-08 06:35:52
问题 I'm using a kml package of R to cluster my data and I need to get in the end a csv file with a column including the number of clusters according to each id. The data has many missing values, so I can't use kmeans function without deleting all observations, but kml works nicely with that. My problem is that I use choice() to export the results and all I get is a graphical window, but no output files. Here is my code: setwd("/Volumes/NATASHKA/api/R files") statadata <-read.dta("Data_wide

PySpark ML: Get KMeans cluster statistics

扶醉桌前 提交于 2019-12-08 03:54:29
问题 I have built a KMeansModel. My results are stored in a PySpark DataFrame called transformed . (a) How do I interpret the contents of transformed ? (b) How do I create one or more Pandas DataFrame from transformed that would show summary statistics for each of the 13 features for each of the 14 clusters? from pyspark.ml.clustering import KMeans # Trains a k-means model. kmeans = KMeans().setK(14).setSeed(1) model = kmeans.fit(X_spark_scaled) # Fits a model to the input dataset with optional

Show rows on clustered kmeans data

二次信任 提交于 2019-12-08 01:50:26
问题 Hi I was wondering when you cluster data on the figure screen is there a way to show which rows the data points belong to when you scroll over them? From the picture above I was hoping there would be a way in which if I select or scroll over the points that I could tell which row it belonged to. Here is the code: %% dimensionality reduction columns = 6 [U,S,V]=svds(fulldata,columns); %% randomly select dataset rows = 1000; columns = 6; %# pick random rows indX = randperm( size(fulldata,1) );

Exporting result from kml package in R

て烟熏妆下的殇ゞ 提交于 2019-12-08 01:11:27
I'm using a kml package of R to cluster my data and I need to get in the end a csv file with a column including the number of clusters according to each id. The data has many missing values, so I can't use kmeans function without deleting all observations, but kml works nicely with that. My problem is that I use choice() to export the results and all I get is a graphical window, but no output files. Here is my code: setwd("/Volumes/NATASHKA/api/R files") statadata <-read.dta("Data_wide_withdemogr_auris_for_kml_negative.dta") mydata <- data.frame(statadata) cldDQ <- cld(mydata) kml(cldDQ,c(2:6)

R - cluster analysis on binary weblog data

↘锁芯ラ 提交于 2019-12-07 22:49:39
问题 I have a web data that looks similar to the sample below. It simply has the user and binary value for whether that user cliked on a particular link within a website. I wanted to do some clustering of this data. My main goal is to find similar users based on their online behaviour. What is a good clustering alorithm for this? I have tried k-means which does not work well with binary data. I have also tried spherical k-means skmeans() . I wanted to do a sum of squared error scree plot, but I

How to label k-means clusters in r

不羁的心 提交于 2019-12-07 17:21:19
问题 The wikibook on kmeans clustering (http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/K-Means) gives an example cluster analysis : Can the code be amended so that a label is generated from each cluster? Below graph does not indicate what is being compared. There are three clusters but what are the names of each cluster ? Here is the code that generates the graph : # import data (assume that all data in "data.txt" is stored as comma separated values) x <- read.csv("data.txt",