cluster-analysis | 易学教程

How to Bound the Outer Area of Voronoi Polygons and Intersect with Map Data

阅读更多关于 How to Bound the Outer Area of Voronoi Polygons and Intersect with Map Data

问题 Background I'm trying to visualize the results of a kmeans clustering procedure on the following data using voronoi polygons on a US map. Here is the code I've been running so far: input <- read.csv("LatLong.csv", header = T, sep = ",") # K Means Clustering set.seed(123) km <- kmeans(input, 17) cent <- data.frame(km$centers) # Visualization states <- map_data("state") StateMap <- ggplot() + geom_polygon(data = states, aes(x = long, y = lat, group = group), col = "white") # Voronoi V <- deldir

Correlating word proximity

阅读更多关于 Correlating word proximity

Let's say I have a text transcript of a dialogue over a period of aprox. 1 hour. I want to know what words happen in close proximatey to one another. What type of statistical technique would I use to determine what words are clustered together and how close their proximatey to one another is? I'm suspecting some sort of cluster analysis or PCA. jayunit100 To determine word proximity, you will have to build a graph: each word is a vertex (or "node"), and left and right words are edges So "I like dogs" would have 2 edges and 3 vertices. Now, the next step will be to decide based on this model

In R, is there an algorithm to create approximately equal sized clusters

阅读更多关于 In R, is there an algorithm to create approximately equal sized clusters

There seems to be a lot of information about creating either hierarchical or k-means clusters. But I would like to know if there is an solution in R that would create K clusters of approximately equal sizes. There is some stuff out there about doing this in other languages, but I have not been able to find anything from searching on the internet that suggests how to achieve the result in R. An example would be set.seed(123) df <- matrix(rnorm(100*5), nrow=100) km <- kmeans(df, 10) print(sapply(1:10, function(n) sum(km$cluster==n))) which results in [1] 14 12 4 13 16 6 8 7 13 7 I would ideally

Best programming language to implement DBSCAN algorithm querying a MongoDB database?

阅读更多关于 Best programming language to implement DBSCAN algorithm querying a MongoDB database?

问题 I've to implement the DBSCAN algorithm. Assuming to start from this pseudocode DBSCAN(D, eps, MinPts) C = 0 for each unvisited point P in dataset D mark P as visited NeighborPts = regionQuery(P, eps) if sizeof(NeighborPts) < MinPts mark P as NOISE else C = next cluster expandCluster(P, NeighborPts, C, eps, MinPts) expandCluster(P, NeighborPts, C, eps, MinPts) add P to cluster C for each point P' in NeighborPts if P' is not visited mark P' as visited NeighborPts' = regionQuery(P', eps) if

Implementation of k-means clustering algorithm

阅读更多关于 Implementation of k-means clustering algorithm

问题 In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop. can anyone please guide me where i'm making a mistake..? for simplicity, i hav taken the input in the program code itself. here is my code : import java.io.*; import java.lang.*; class Kmean { public static void main(String args[]) { int N=9; int arr[]={2,4,10,12,3,20,30,11,25};

Generate random points distributed like cities?

阅读更多关于 Generate random points distributed like cities?

How can one generate say 1000 random points with a distribution like that of towns and cities in e.g. Ohio ? I'm afraid I can't define "distributed like cities" precisely; uniformly distributed centres + small Gaussian clouds are easy but ad hoc. Added: There must be a family of 2d distributions with a clustering parameter that can be varied to match a given set of points ? Maybe you can take a look at Walter Christaller's Theory of Central Places . I guess there must be some generator somewhere, or you can cook up your own. Start with a model of the water features in your target area (or make

Clustering with scipy - clusters via distance matrix, how to get back the original objects

阅读更多关于 Clustering with scipy - clusters via distance matrix, how to get back the original objects

I can't seam to find any simple enough tutorials or descriptions on clustering in scipy, so I'll try to explain my problem: I try to cluster documents (hierarchical agglomerative clustering) , and have created a vector for each document and produced a symmetric distance matrix. The vector_list contains (really long) vectors representing each document. The order of this list of vectors is the same as my list of input documents so that I'll (hopefully) be able to match the results of the clustering with the corresponding document. distances = distance.cdist(vector_list, vector_list, 'euclidean')

How to know about group information in cluster analysis (hierarchical)?

阅读更多关于 How to know about group information in cluster analysis (hierarchical)?

问题 I have problem about group in cluster analysis(hierarchical cluster). As example, this is the dendrogram of complete linkage of Iris data set . After I use > table(cutree(hc, 3), iris$Species) This is the output : setosa versicolor virginica 1 50 0 0 2 0 23 49 3 0 27 1 I have read in one statistical website that, object 1 in the data always belongs to group/cluster 1. From the output above, we know that setosa is in group 1 . Then, how I am going to know about the other two species. How do

K means finding elbow when the elbow plot is a smooth curve

阅读更多关于 K means finding elbow when the elbow plot is a smooth curve

I am trying to plot the elbow of k means using the below code: load CSDmat %mydata for k = 2:20 opts = statset('MaxIter', 500, 'Display', 'off'); [IDX1,C1,sumd1,D1] = kmeans(CSDmat,k,'Replicates',5,'options',opts,'distance','correlation');% kmeans matlab [yy,ii] = min(D1'); %% assign points to nearest center distort = 0; distort_across = 0; clear clusts; for nn=1:k I = find(ii==nn); %% indices of points in cluster nn J = find(ii~=nn); %% indices of points not in cluster nn clusts{nn} = I; %% save into clusts cell array if (length(I)>0) mu(nn,:) = mean(CSDmat(I,:)); %% update mean %% Compute

OpenCV K-Means (kmeans2)

阅读更多关于 OpenCV K-Means (kmeans2)

I'm using Opencv's K-means implementation to cluster a large set of 8-dimensional vectors. They cluster fine, but I can't find any way to see the prototypes created by the clustering process. Is this even possible? OpenCV only seems to give access to the cluster indexes (or labels). If not I guess it'll be time to make my own implementation! I can't say I used OpenCV's implementation of Kmeans, but if you have access to the labels given to each instance, you can simply get the centroids by calculating the average vector of instances belong to each of the clusters. As of (at least) OpenCV 2.0,