k-means | 易学教程

Clustering SURF features of an image dataset using k-means algorithm

阅读更多关于 Clustering SURF features of an image dataset using k-means algorithm

问题 I want to implement Bag of Visual Words in MATLAB. First I read images from dataset directory and I detect SURF features and extract them using these two functions detectSURFFeatures and extractFeatures . I store each feature into a cell array and finally I want to cluster them using the k-means algorithm but I can't fit this data into k-means function input. How can I insert SURF features into the k-means clustering algorithm in MATLAB? Here is my sample code which reads image from files and

How to export the output (cluster labels) of k-means algorithm with the ids in the original data

阅读更多关于 How to export the output (cluster labels) of k-means algorithm with the ids in the original data

问题 I have a data summarising a network including users' cookie id, session id, number of materials, and number of jumps in the network. I would like to cluster them and further analyse them. So, need to know which cookie id in which session is labelled in which cluster. Example data: cookie_id|ses_num|num_material|num_jump 2345 1 2 1 2345 2 8 12 3456 1 3 2 I have applied k-means clustering using the last two columns but cannot return the clustering output to the right id as I cannot use cookie

R kmeans final distance to to centroid

阅读更多关于 R kmeans final distance to to centroid

问题 I have run a kmeans algorithm on the iris dataset in R using the command kmeans_iris <- kmeans(iris[,1:4], centers=3) . I now want to know the distance from a given observation in the iris dataset to its corresponding cluster's centroid. I could write code to manually calculate the Euclidean distance from an observation to the centers corresponding to its cluster, but is there not an easy, built-in way to do this? 回答1: As far as I can tell, there isn't a method for extracting the per case

how to set Spark Kmeans initial centers

阅读更多关于 how to set Spark Kmeans initial centers

问题 I'm using Spark ML for run Kmeans. I have bunch of data and three existing centers, for example the three centers are: [1.0,1.0,1.0],[5.0,5.0,5.0],[9.0,9.0,9.0]. So how can I indicate the Kmeans centers are the above three vectors. I saw Kmean object has seed parameter, but the seed parameter is an long type not an array. So how can I tell Spark Kmeans to only use the existing centers for clustering. Or say, I didn't understand what does seed mean in Spark Kmeans, I suppose the seeds should

R shiny, shinyjs, remove plot and draw it again if button is clicked

阅读更多关于 R shiny, shinyjs, remove plot and draw it again if button is clicked

问题 I have an app that plots the Old Faithul data with k-means clusters once the user clicks on a specific button ("run k-means"). Now, I want to add a button that removes the plot again ("remove plot"). Following Hide/show outputs Shiny R I tried this: library(shiny) ui <- fluidPage( actionButton(inputId = "run_kmeans", label = "run k-means"), actionButton(inputId = "remove_plot", label = "remove plot"), conditionalPanel("output.show", plotOutput("plot")) ) server <- function(input, output) { v

scikit-learn's k-means: what does the predict method really do?

阅读更多关于 scikit-learn's k-means: what does the predict method really do?

问题 When I use scikit-learn's implementation of k-means I usually just call the fit() method and that is enough to get the cluster centers and the labels. The predict() method is used to calculate the labels and even a fit_predict() method is available for convenience, but if I can get the labels only using fit() , what is the purpose of the predict() method? 回答1: predict , as @EdChum suggested, can be used on unseen data. This (and more so, the transform method) is useful when k-means is used

k-means using signature matrix generated from minhash

阅读更多关于 k-means using signature matrix generated from minhash

问题 I have used minhash on documents and their shingles to generate a signature matrix from these documents. I have verified that the signature matrices are good as comparing jaccard distances of known similar documents (say, two articles about the same sports team or two articles about the same world event) give correct readings. My question is: does it make sense to use this signature matrix to perform k-means clustering? I've tried using the signature vectors of documents and calculating the

Make silhouette plot legible for k-means

阅读更多关于 Make silhouette plot legible for k-means

问题 I am trying to make a silhouette plot for a k-means clustering, but the bars are almost invisble. How can I make this chart legible? Example code: require(cluster) X <- EuStockMarkets kmm <- kmeans(X, 8) D <- daisy(X) plot(silhouette(kmm$cluster, D), col=1:8) Example output: 回答1: To fix this, set the border to NA: plot(silhouette(kmm$cluster, D), col=1:8, border=NA) 回答2: Really new to R, so I might be on the wrong track. Could you specify the column colors? Something like: require(cluster) X

Optimizing K-means clustering using Genetic Algorithm

阅读更多关于 Optimizing K-means clustering using Genetic Algorithm

问题 I have the following dataset (obtained here): ----------item survivalpoints weight 1 pocketknife 10 1 2 beans 20 5 3 potatoes 15 10 4 unions 2 1 5 sleeping bag 30 7 6 rope 10 5 7 compass 30 1 I can cluster this dataset into three clusters with kmeans() using a binary string as my initial choice of centers. For eg: ## 1 represents the initial centers chromosome = c(1,1,1,0,0,0,0) ## exclude first column (kmeans only support continous data) cl <- kmeans(dataset[, -1], dataset[chromosome == 1,

r: error for NbClust() call when deploying it within for() loop - “Error in if ((res[ncP - min_nc + 1, 15] <= resCritical[ncP - min_nc + :”

阅读更多关于 r: error for NbClust() call when deploying it within for() loop - “Error in if ((res[ncP - min_nc + 1, 15]

问题 I want to call the NbClust() function for a couple of dataframes. I do so by "sending" them all through a for loop that contains the NbClust() function call. The code looks like this: #combos of just all columns from df variations = unlist(lapply(seq_along(df), function(x) combn(df, x, simplify=FALSE)), recursive=FALSE) for(i in 1:length(variations)){ df = data.frame(variations[i]) nc = NbClust(scale(df), distance="euclidean", min.nc=2, max.nc=10, method="complete") } Unfortunately it always