cluster-analysis

How to display the row name in K means cluster plot in R?

旧时模样 提交于 2019-12-04 05:16:03
问题 I am trying to plot the K-means cluster. The below is the code i use. library(cluster) library(fpc) data(iris) dat <- iris[, -5] # without known classification # Kmeans clustre analysis clus <- kmeans(dat, centers=3) clusplot(dat, clus$cluster, color=TRUE, shade=TRUE, labels=2, lines=0) I get the below picture: Instead of the row numbers, I want it displayed with a row name in characters. I understand this picture is producing had the data like the below: Sepal.Length Sepal.Width Petal.Length

Clustering using a custom distance metric for lat/long pairs

二次信任 提交于 2019-12-04 03:43:30
I'm trying to specify a custom clustering function for the scikit-learn DBSCAN implementation: def geodistance(latLngA, latLngB): print latLngA, latLngB return vincenty(latLngA, latLngB).miles cluster_labels = DBSCAN( eps=500, min_samples=max(2, len(found_geopoints)/10), metric=geodistance ).fit(np.array(found_geopoints)).labels_ However, when I print out the arguments to my distance function they aren't at all what I would expect: [ 0.53084126 0.19584111 0.99640966 0.88013373 0.33753788 0.79983037 0.71716144 0.85832664 0.63559538 0.23032912] [ 0.53084126 0.19584111 0.99640966 0.88013373 0

Scipy's sparse eigsh() for small eigenvalues

不想你离开。 提交于 2019-12-04 03:40:37
I'm trying to write a spectral clustering algorithm using NumPy/SciPy for larger (but still tractable) systems, making use of SciPy's sparse linear algebra library. Unfortunately, I'm running into stability issues with eigsh() . Here's my code: import numpy as np import scipy.sparse import scipy.sparse.linalg as SLA import sklearn.utils.graph as graph W = self._sparse_rbf_kernel(self.X_, self.datashape) D = scipy.sparse.csc_matrix(np.diag(np.array(W.sum(axis = 0))[0])) L = graph.graph_laplacian(W) # D - W vals, vects = SLA.eigsh(L, k = self.k, M = D, which = 'SM', sigma = 0, maxiter = 1000)

Affinity Propagation preferences initialization

廉价感情. 提交于 2019-12-04 03:23:18
I need to perform clustering without knowing in advance the number of clusters. The number of cluster may be from 1 to 5, since I may find cases where all the samples belong to the same instance, or to a limited number of group. I thought affinity propagation could be my choice, since I could control the number of clusters by setting the preference parameter. However, if I have a single cluster artificially generated and I set preference to the minimal euclidean distance among nodes (to minimize number of clusters), I get terrible over clustering. """ ==========================================

R - 'princomp' can only be used with more units than variables

ε祈祈猫儿з 提交于 2019-12-04 03:21:24
I am using R software (R commander) to cluster my data. I have a smaller subset of my data containing 200 rows and about 800 columns. I am getting the following error when trying kmeans cluster and plot on a graph. "'princomp' can only be used with more units than variables" I then created a test doc of 10 row and 10 columns whch plots fine but when I add an extra column I get te error again. Why is this? I need to be able to plot my cluster. When I view my data set after performing kmeans on it I can see the extra results column which shows which clusters they belong to. IS there anything I

Interpretation of 'ufactor' on a toy graph clustering

夙愿已清 提交于 2019-12-04 02:33:26
问题 I am trying to do a imbalanced partition by METIS. I do not need equal number of vertices in each cluster(which is done by default in METIS). My graph has no constraints, it's a undirected unweighted graph. Here is a example toy graph clustered by METIS without no ufactor parameter. Then, i tried with different ufactor and at value 143, METIS starts to do the expected cluster like the following- Can anybody interpret this. Eventually, I want to find a way to guess an ufactor from any

clustering with NA values in R

人盡茶涼 提交于 2019-12-04 01:52:46
I was surprised to find out that clara from library(cluster) allows NAs. But function documentation says nothing about how it handles these values. So my questions are: How clara handles NAs? Can this be somehow used for kmeans (Nas not allowed)? [Update] So I did found lines of code in clara function: inax <- is.na(x) valmisdat <- 1.1 * max(abs(range(x, na.rm = TRUE))) x[inax] <- valmisdat which do missing value replacement by valmisdat . Not sure I understand the reason to use such formula. Any ideas? Would it be more "natural" to treat NAs by each column separately, maybe replacing with

How to Bound the Outer Area of Voronoi Polygons and Intersect with Map Data

你离开我真会死。 提交于 2019-12-03 23:47:30
Background I'm trying to visualize the results of a kmeans clustering procedure on the following data using voronoi polygons on a US map. Here is the code I've been running so far: input <- read.csv("LatLong.csv", header = T, sep = ",") # K Means Clustering set.seed(123) km <- kmeans(input, 17) cent <- data.frame(km$centers) # Visualization states <- map_data("state") StateMap <- ggplot() + geom_polygon(data = states, aes(x = long, y = lat, group = group), col = "white") # Voronoi V <- deldir(cent$long, cent$lat) ll <-apply(V$dirsgs, 1, FUN = function(x){ readWKT(sprintf("LINESTRING(%s %s, %s

Implementation of k-means clustering algorithm

情到浓时终转凉″ 提交于 2019-12-03 21:24:55
In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop. can anyone please guide me where i'm making a mistake..? for simplicity, i hav taken the input in the program code itself. here is my code : import java.io.*; import java.lang.*; class Kmean { public static void main(String args[]) { int N=9; int arr[]={2,4,10,12,3,20,30,11,25}; // initial data int i,m1,m2,a,b,n=0; boolean flag=true; float sum1=0,sum2=0; a=arr[0];b=arr[1]; m1=a;

Running clustering algorithms in ELKI

无人久伴 提交于 2019-12-03 21:02:40
I need to run a k-medoids clustering algorithm by using ELKI programmatically. I have a similarity matrix that I wish to input to the algorithm. Is there any code snippet available for how to run ELKI algorithms? I basically need to know how to create Database and Relation objects, create a custom distance function, and read the algorithm output. Unfortunately the ELKI tutorial ( http://elki.dbs.ifi.lmu.de/wiki/Tutorial ) focuses on the GUI version and on implementing new algorithms, and trying to write code by looking at the Javadoc is frustrating. If someone is aware of any easy-to-use