cluster-analysis | 易学教程

Scikit Learn - K-Means - Elbow - criterion

阅读更多关于 Scikit Learn - K-Means - Elbow - criterion

问题 Today i'm trying to learn something about K-means. I Have understand the algorithm and i know how it works. Now i'm looking for the right k... I found the elbow criterion as a method to detect the right k but i do not understand how to use it with scikit learn?! In scikit learn i'm clustering things in this way kmeans = KMeans(init='k-means++', n_clusters=n_clusters, n_init=10) kmeans.fit(data) So should i do this several times for n_clusters = 1...n and watch at the Error rate to get the

Get ordered kmeans cluster labels

阅读更多关于 Get ordered kmeans cluster labels

问题 Say I have a data set x and do the following kmeans cluster: fit <- kmeans(x,2) My question is in regards to the output of fit$cluster: I know that it will give me a vector of integers (from 1:k) indicating the cluster to which each point is allocated. Instead, is there a way to have the clusters be labeled 1,2, etc... in order of decreasing numerical value of their center? For example: If x=c(1.5,1.4,1.45,.2,.3,.3) , then fit$cluster should result in (1,1,1,2,2,2) but not result in (2,2,2,1

How to choose and plot the quality criterion in `kml` function?

阅读更多关于 How to choose and plot the quality criterion in `kml` function?

问题 I just started working with the kml package to perform longitudinal k-means clustering in R . By default the kml function uses the Calinski Harabatz Sorted criterion to choose the 'best' clustering. So by accessing the 'best' clustering you will always see the Calinski Harabatz Sorted criterion. How can we choose another quality criterion ? A minimal example: library(kml) # some data cld <- generateArtificialLongData(25) # perform clustering kml(cold) # choose the 'best' clustering: choice

How to choose and plot the quality criterion in `kml` function?

阅读更多关于 How to choose and plot the quality criterion in `kml` function?

kmeans: Quick-TRANSfer stage steps exceeded maximum

阅读更多关于 kmeans: Quick-TRANSfer stage steps exceeded maximum

问题 I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20) . I get the following error: Quick-TRANSfer stage steps exceeded maximum (= 31834400) , and although one can view the code at http://svn.r-project.org/R/trunk/src/library/stats/R/kmeans.R - I am unsure as to what is going wrong. I assume my problem has to do with the size of my dataset, but I would be grateful if someone

kmeans: Quick-TRANSfer stage steps exceeded maximum

阅读更多关于 kmeans: Quick-TRANSfer stage steps exceeded maximum

Aggregate Weighted Linestrings for Clustered Markers in Leaflet in R

阅读更多关于 Aggregate Weighted Linestrings for Clustered Markers in Leaflet in R

问题 I'm trying to plot locations and weighted connecting linestrings. When I zoom in or out the clustering of the markers adjusts fine. The shown labels of the clusters are the aggregated node_val of the markers. I would like to do similar with the linestrings, so that the plot does not show the blue lines connecting the single markers, but instead lines connecting the clusters of markers, and the new linestrings that connect the clusters of markers are customized in width dependent on the wgt

Aggregate Weighted Linestrings for Clustered Markers in Leaflet in R

阅读更多关于 Aggregate Weighted Linestrings for Clustered Markers in Leaflet in R

Reducing NbClust memory usage

阅读更多关于 Reducing NbClust memory usage

问题 I need some help with massive usage of memory by the NbClust function. On my data, memory balloons to 56GB at which point R crashes with a fatal error. Using debug() , I was able to trace the error to these lines: if (any(indice == 23) || (indice == 32)) { res[nc - min_nc + 1, 23] <- Index.sPlussMoins(cl1 = cl1, md = md)$gamma Debugging of Index.sPlussMoins revealed that the crash happens during a for loop. The iteration that it crashes at varies, and during the loop memory usage varies

R: ggplot to visualize all variables in each cluster after cluster analysis

阅读更多关于 R: ggplot to visualize all variables in each cluster after cluster analysis

问题 Sorry in advance if the post isn't clear. So I have my dataframe, 74 observations and 43 columns. I performed cluster analysis on them. I then got 5 clusters, and assigned the cluster number to each respective row. Now, my df has 74 rows (obs) and 44 variables. And I would like to plot and see in each cluster what variables are enriched and what variables are not, for all variables. I want to achieve this by ggplot. My imaginary output panel is to have 5 boxplots per row, and 42 rows plots,