cluster-analysis

Scikit Learn - K-Means - Elbow - criterion

Deadly 提交于 2020-05-09 17:43:05
问题 Today i'm trying to learn something about K-means. I Have understand the algorithm and i know how it works. Now i'm looking for the right k... I found the elbow criterion as a method to detect the right k but i do not understand how to use it with scikit learn?! In scikit learn i'm clustering things in this way kmeans = KMeans(init='k-means++', n_clusters=n_clusters, n_init=10) kmeans.fit(data) So should i do this several times for n_clusters = 1...n and watch at the Error rate to get the

Get ordered kmeans cluster labels

≡放荡痞女 提交于 2020-04-30 07:32:05
问题 Say I have a data set x and do the following kmeans cluster: fit <- kmeans(x,2) My question is in regards to the output of fit$cluster: I know that it will give me a vector of integers (from 1:k) indicating the cluster to which each point is allocated. Instead, is there a way to have the clusters be labeled 1,2, etc... in order of decreasing numerical value of their center? For example: If x=c(1.5,1.4,1.45,.2,.3,.3) , then fit$cluster should result in (1,1,1,2,2,2) but not result in (2,2,2,1

How to choose and plot the quality criterion in `kml` function?

只愿长相守 提交于 2020-04-30 06:38:46
问题 I just started working with the kml package to perform longitudinal k-means clustering in R . By default the kml function uses the Calinski Harabatz Sorted criterion to choose the 'best' clustering. So by accessing the 'best' clustering you will always see the Calinski Harabatz Sorted criterion. How can we choose another quality criterion ? A minimal example: library(kml) # some data cld <- generateArtificialLongData(25) # perform clustering kml(cold) # choose the 'best' clustering: choice

How to choose and plot the quality criterion in `kml` function?

断了今生、忘了曾经 提交于 2020-04-30 06:38:45
问题 I just started working with the kml package to perform longitudinal k-means clustering in R . By default the kml function uses the Calinski Harabatz Sorted criterion to choose the 'best' clustering. So by accessing the 'best' clustering you will always see the Calinski Harabatz Sorted criterion. How can we choose another quality criterion ? A minimal example: library(kml) # some data cld <- generateArtificialLongData(25) # perform clustering kml(cold) # choose the 'best' clustering: choice

kmeans: Quick-TRANSfer stage steps exceeded maximum

旧巷老猫 提交于 2020-04-07 14:29:29
问题 I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20) . I get the following error: Quick-TRANSfer stage steps exceeded maximum (= 31834400) , and although one can view the code at http://svn.r-project.org/R/trunk/src/library/stats/R/kmeans.R - I am unsure as to what is going wrong. I assume my problem has to do with the size of my dataset, but I would be grateful if someone

kmeans: Quick-TRANSfer stage steps exceeded maximum

懵懂的女人 提交于 2020-04-07 14:29:25
问题 I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20) . I get the following error: Quick-TRANSfer stage steps exceeded maximum (= 31834400) , and although one can view the code at http://svn.r-project.org/R/trunk/src/library/stats/R/kmeans.R - I am unsure as to what is going wrong. I assume my problem has to do with the size of my dataset, but I would be grateful if someone

Aggregate Weighted Linestrings for Clustered Markers in Leaflet in R

不问归期 提交于 2020-04-07 09:19:08
问题 I'm trying to plot locations and weighted connecting linestrings. When I zoom in or out the clustering of the markers adjusts fine. The shown labels of the clusters are the aggregated node_val of the markers. I would like to do similar with the linestrings, so that the plot does not show the blue lines connecting the single markers, but instead lines connecting the clusters of markers, and the new linestrings that connect the clusters of markers are customized in width dependent on the wgt

Aggregate Weighted Linestrings for Clustered Markers in Leaflet in R

99封情书 提交于 2020-04-07 09:18:47
问题 I'm trying to plot locations and weighted connecting linestrings. When I zoom in or out the clustering of the markers adjusts fine. The shown labels of the clusters are the aggregated node_val of the markers. I would like to do similar with the linestrings, so that the plot does not show the blue lines connecting the single markers, but instead lines connecting the clusters of markers, and the new linestrings that connect the clusters of markers are customized in width dependent on the wgt

Reducing NbClust memory usage

ぐ巨炮叔叔 提交于 2020-03-23 17:49:30
问题 I need some help with massive usage of memory by the NbClust function. On my data, memory balloons to 56GB at which point R crashes with a fatal error. Using debug() , I was able to trace the error to these lines: if (any(indice == 23) || (indice == 32)) { res[nc - min_nc + 1, 23] <- Index.sPlussMoins(cl1 = cl1, md = md)$gamma Debugging of Index.sPlussMoins revealed that the crash happens during a for loop. The iteration that it crashes at varies, and during the loop memory usage varies

R: ggplot to visualize all variables in each cluster after cluster analysis

廉价感情. 提交于 2020-03-05 04:09:23
问题 Sorry in advance if the post isn't clear. So I have my dataframe, 74 observations and 43 columns. I performed cluster analysis on them. I then got 5 clusters, and assigned the cluster number to each respective row. Now, my df has 74 rows (obs) and 44 variables. And I would like to plot and see in each cluster what variables are enriched and what variables are not, for all variables. I want to achieve this by ggplot. My imaginary output panel is to have 5 boxplots per row, and 42 rows plots,