cluster-analysis

Why is the line of wss-plot (for optimizing the cluster analysis) looks so fluctuated?

↘锁芯ラ 提交于 2019-12-08 12:35:04
问题 I have a cluster plot by R while I want to optimize the "elbow criterion" of clustering with a wss plot, so I drew a wss plot for my cluster, but is looks really strange and I do not know how many elbows should I cluster, anyone could help me? Here is my data: Friendly<-c(0.533,0.854,0.9585,0.925,0.9125,0.9815,0.9645,0.981,0.9935,0.9585,0.996,0.956,0.9415) Polite<-c(0,0.45,0.977,0.9915,0.929,0.981,0.9895,0.9875,1,0.96,0.996,0.873,0.9125) Praising<-c(0,0,0.437,0.9585,0.9415,0.9605,0.998,0.998

Identifying column and row clusters with linear programming

陌路散爱 提交于 2019-12-08 12:06:29
问题 I believe that the question Is there a good way to do this type of mining? could be solved using linear programming techniques. But I am completely new to this and do not know the best way to frame this as a minimization. Would the following approach be OK? Have a continuous variable for each row and column which is the "length" spanned by all members in that row/column Have a variable for each "point" (each black dot) that indicates whether it is a member of the row or column group Minimize

Random Clustering Algorithm

时间秒杀一切 提交于 2019-12-08 10:42:33
问题 I have set of points, and i want clusters out of them. I know how to do normal k-means algorithm. But i don't want to take 'k' as input. Suppose if i have points like 1,3,4,50,60,70,1000,10002,10004 the algorithm should cluster them into 3 clusters C1: 1,3,4 C2: 50,60,70 C3: 1000,1002,1004 satisfying distance between intracluster elements should be minimum, and distance between intercluster should be maximum. 回答1: See how-do-i-determine-k-when-using-k-means-clustering and the links there. 回答2

Output from 'choice' in R's kml

Deadly 提交于 2019-12-08 09:48:28
问题 I'm having trouble getting 'choice' to create output. When the graphical interface launches, I am selecting a partition with the space bar. This creates a black circle around the partition, indicating it has been selected. When I click 'return', nothing happens. I checked my working directory to look for the output files, but they are not there. I used getwd() to ensure that I have the correct setwd(). No dice. There was a similar question posted: Exporting result from kml package in R;

Antipole Clustering

强颜欢笑 提交于 2019-12-08 09:39:47
问题 I made a photo mosaic script (PHP). This script has one picture and changes it to a photo buildup of little pictures. From a distance it looks like the real picture, when you move closer you see it are all little pictures. I take a square of a fixed number of pixels and determine the average color of that square. Then I compare this with my database which contains the average color of a couple thousand of pictures. I determine the color distance with all available images. But to run this

how to partition the nodes of an undirected graph into k sets

孤街浪徒 提交于 2019-12-08 09:31:36
问题 I have an undirected graph G=(V,E) where each vertex represents a router in a large network. Each edge represents a network hop from one router to the other therefore, all edges have the same weight. I wish to partition this network of routers into 3 or k different sets clustered by Hop count. Motivation: The idea is to replicate some data in routers contained in each of these 3 sets. This is so that whenever a node( or client or whatever) in the network graph requests for a certain data item

Should one use distances (dissimilarities) or similarities in R for clustering?

拈花ヽ惹草 提交于 2019-12-08 08:45:52
问题 I'm doing a cluster problem, and the proxy package in R provides both dist and simil functions. For my purpose I need a distance matrix, so I initially used dist, and here's the code: distanceMatrix <- dist(dfm[,-1], method='Pearson') clusters <- hclust(distanceMatrix) clusters$labels <- dfm[,1]#colnames(dfm)[-1] plot(clusters, labels=clusters$labels) But after I ploted the image I found that the cluster result is not the way I expecte it to be, since I know what it should look like. So I

Exporting result from kml package in R

孤人 提交于 2019-12-08 06:35:52
问题 I'm using a kml package of R to cluster my data and I need to get in the end a csv file with a column including the number of clusters according to each id. The data has many missing values, so I can't use kmeans function without deleting all observations, but kml works nicely with that. My problem is that I use choice() to export the results and all I get is a graphical window, but no output files. Here is my code: setwd("/Volumes/NATASHKA/api/R files") statadata <-read.dta("Data_wide

Change in preference value does not affect the results of Affinity propagation Clustering

孤者浪人 提交于 2019-12-08 05:11:33
问题 Refer to the following code import numpy as np from sklearn.cluster import AffinityPropagation from sklearn import metrics from sklearn.datasets.samples_generator import make_blobs ############################################################################## # Generate sample data centers = [[1, 1], [-1, -1], [1, -1]] X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5) # Compute similarities X_norms = np.sum(X ** 2, axis=1) S = - X_norms[:, np.newaxis] - X_norms[np

Fuzzy c-means tcp dump clustering in matlab

↘锁芯ラ 提交于 2019-12-08 04:28:21
问题 Hi I have some data thats represented like this: 0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal. Its from the kdd cup 1999 which was based on the darpa set. the text file I have has rows and rows of data like this, in matlab there is the generic clustering tool you can use by typing findcluster but it only accepts .dat files. Im also not very sure if it will accept the format like this. Im also