cluster-analysis | 易学教程

R: ggplot to visualize all variables in each cluster after cluster analysis

阅读更多关于 R: ggplot to visualize all variables in each cluster after cluster analysis

问题 Sorry in advance if the post isn't clear. So I have my dataframe, 74 observations and 43 columns. I performed cluster analysis on them. I then got 5 clusters, and assigned the cluster number to each respective row. Now, my df has 74 rows (obs) and 44 variables. And I would like to plot and see in each cluster what variables are enriched and what variables are not, for all variables. I want to achieve this by ggplot. My imaginary output panel is to have 5 boxplots per row, and 42 rows plots,

choosing bandwidth&linspace for kernel density estimation. (why my bandwidth doesn't work?)

阅读更多关于 choosing bandwidth&linspace for kernel density estimation. (why my bandwidth doesn't work?)

问题 I have followed this link for the application of kernel density estimation. My aim is creating two different groups/clusters or more for an array group. The below code works for every members of array group except this array: X = np.array([[77788], [77793],[77798], [77803], [92886], [92891], [92896], [92901]]) So my expectation is seeing two different clusters such as: first_group = ([[77788], [77793],[77798], [77803]]) second_group = ([[92886], [92891], [92896], [92901]]) I have a dynamic

Refitting clusters around fixed centroids

阅读更多关于 Refitting clusters around fixed centroids

问题 Clustering/classification problem: Used k-means clustering to generate these clusters and centroids: This is the dataset with the added cluster attribute from the initial run: > dput(sampledata) structure(list(Player = structure(1:5, .Label = c("A", "B", "C", "D", "E"), class = "factor"), Metric.1 = c(0.3938961, 0.28062338, 0.32532626, 0.29239642, 0.25622558), Metric.2 = c(0.00763359, 0.01172354, 0.40550867, 0.04026846, 0.05976367), Metric.3 = c(0.50766075, 0.20345662, 0.06267444, 0.08661417,

Generating a heatmap that depicts the clusters in a dataset using hierarchical clustering in R

阅读更多关于 Generating a heatmap that depicts the clusters in a dataset using hierarchical clustering in R

问题 I am trying to take my dataset which is made up of protein dna interaction, cluster the data and generate a heatmap that displays the resulting data such that the data looks clustered with the clusters lining up on the diagonal. I am able to cluster the data and generate a dendrogram of that data however when I generate the heatmap of the data using the heatmap function in R, the clusters are not visible. If you look at the first 2 images one is of the dendrogram I am able to generate, the

Generating a heatmap that depicts the clusters in a dataset using hierarchical clustering in R

阅读更多关于 Generating a heatmap that depicts the clusters in a dataset using hierarchical clustering in R

How to use existing data in ELKI

阅读更多关于 How to use existing data in ELKI

问题 I keep stubbling upon ELKI these couple of days while searching for the most suitable density clustering tool and decided to try it. For DBSCAN, I've managed to reproduce successfully the test which clusters the file "3clusters-and-noise-2d.csv" and have also managed to print clusters metadata and points in each cluster all via ELKI code from github (latest version) IN java (I'm not really interested in cli or ui tool). Now, I want to use some kind of internal java structure to create a

What are noisy samples in Scikit's DBSCAN clustering algorithm?

阅读更多关于 What are noisy samples in Scikit's DBSCAN clustering algorithm?

问题 If I apply Scikit's DBSCAN (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) on a similarity matrix, I get a series of labels back. Some of these labels are -1. The documentation calls them noisy samples. What are these? Do they all belong to a single cluster, or do they each belong to their own cluster since they're noisy? Thank you 回答1: These are not exactly part of a cluster. They are simply points that do not belong to any clusters and can be "ignored" to some

initial centroids for scikit-learn kmeans clustering

阅读更多关于 initial centroids for scikit-learn kmeans clustering

问题 if I already have a numpy array that can serve as the initial centroids, how can I properly initialize the kmeans algorithm? I am using the scikit-learn Kmeans class this post (k-means with selected initial centers) indicates that I only need to set n_init=1 if I am using a numpy array as the initial centroids but I am not sure if my initialization is working properly Naftali Harris' excellent visualization page shows what I am trying to do http://www.naftaliharris.com/blog/visualizing-k

KMeans clustering for more than 5 million vectors

阅读更多关于 KMeans clustering for more than 5 million vectors

问题 I have hit a real problem. I need to do some Kmeans clustering for 5 million vectors, each containing about 32 cols. I tried out Mahout which requires linux and I am on windows, I am restrained from using a Linux OS and any sort of simulator. Can anyone suggest a KMeans clustering algorithm that is scalable upto 5M vectors and can converge quickly? I have tested a few but they wont scale. Which means they are slow and take forever to complete. Thanks 回答1: OK, So who ever wants clustering for

Color branches of dendrogram using an existing column

阅读更多关于 Color branches of dendrogram using an existing column

问题 I have a data frame which I am trying to cluster. I am using hclust right now. In my data frame, there is a FLAG column which I would like to color the dendrogram by. By the resulting picture, I am trying to figure out similarities among various FLAG categories. My data frame looks something like this: FLAG ColA ColB ColC ColD I am clustering on colA , colB , colC and colD . I would like to cluster these and color them according to FLAG categories. Ex - color red if 1, blue if 0 (I have only